notesum.ai

Published at October 23

Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling

cs.CL

cs.AI

Released Date: October 23, 2024

Authors: Nirav Bhan¹, Shival Gupta¹, Sai Manaswini¹, Ritik Baba², Narun Yadav¹, Hillori Desai¹, Yash Choudhary³, Aman Pawar¹, Sarthak Shrivastava¹, Sudipta Biswas¹

Aff.: ¹Floworks; ²Floworks and Indian Institute of Technology Kharagpur; ³Floworks and Indian Institute of Technology Bombay

Arxiv: https://arxiv.org/abs/2410.17950v1

Model	Accuracy	Reliability	Latency (s)	Cost ($/1000 queries)
Floworks-ThorV2	90.1%	100%	2.29	$1.60
Claude-3 Opus	78.2%	59.7%	15.3	$46.7
GPT-4o	51.4%	83.9%	2.92	$4.14
GPT-4-turbo	48.6%	86.6%	4.55	$6.15