notesum.ai
Published at October 23Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling
cs.CL
cs.AI
Released Date: October 23, 2024
Authors: Nirav Bhan1, Shival Gupta1, Sai Manaswini1, Ritik Baba2, Narun Yadav1, Hillori Desai1, Yash Choudhary3, Aman Pawar1, Sarthak Shrivastava1, Sudipta Biswas1
Aff.: 1Floworks; 2Floworks and Indian Institute of Technology Kharagpur; 3Floworks and Indian Institute of Technology Bombay

| Model | Accuracy | Reliability | Latency (s) | Cost ($/1000 queries) |
|---|---|---|---|---|
| Floworks-ThorV2 | 90.1% | 100% | 2.29 | $1.60 |
| Claude-3 Opus | 78.2% | 59.7% | 15.3 | $46.7 |
| GPT-4o | 51.4% | 83.9% | 2.92 | $4.14 |
| GPT-4-turbo | 48.6% | 86.6% | 4.55 | $6.15 |