notesum.ai
Published at November 4CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
cs.CL
cs.AI
Released Date: November 4, 2024
Authors: Kung-Hsiang Huang1, Akshara Prabhakar1, Sidharth Dhawan1, Yixin Mao1, Huan Wang1, Silvio Savarese1, Caiming Xiong1, Philippe Laban1, Chien-Sheng Wu1
Aff.: 1Salesforce AI Research
![[Uncaptioned image]](https://arxiv.org/html/2411.02305v1/extracted/5977196/figures/crmarena_logo.png)
| Model | HTU | TCU | NED | TII | PVI |
| Act | |||||
| gpt-4o | 15.4 | 48.7 | 94.9 | 87.2 | 92.3 |
| gpt-4o-mini | 94.9 | 79.5 | 30.8 | 79.5 | 74.4 |
| claude-3.5-sonnet | 25.6 | 28.2 | 82.1 | 33.3 | 84.6 |
| claude-3-sonnet | 84.6 | 79.5 | 100.0 | 51.3 | 74.4 |
| llama3.1-405b | 56.4 | 51.3 | 46.2 | 38.5 | 0.0 |
| llama3.1-70b | 46.2 | 76.9 | 20.5 | 20.5 | 100.0 |
| ReAct | |||||
| gpt-4o | 64.1 | 48.7 | 100.0 | 84.6 | 74.4 |
| gpt-4o-mini | 97.4 | 82.1 | 97.4 | 61.5 | 71.8 |
| claude-3.5-sonnet | 12.8 | 7.7 | 87.2 | 30.8 | 82.1 |
| claude-3-sonnet | 79.5 | 84.6 | 94.9 | 69.2 | 94.9 |
| llama3.1-405b | 53.8 | 38.5 | 97.4 | 41.0 | 64.1 |
| llama3.1-70b | 64.1 | 41.0 | 97.4 | 17.9 | 17.9 |
| Function Calling | |||||
| gpt-4o | 59.0 | 84.6 | 74.4 | 100.0 | 35.9 |
| gpt-4o-mini | 15.4 | 7.7 | 0.0 | 0.0 | 0.0 |
| claude-3.5-sonnet | 52.6 | 74.4 | 100.0 | 100.0 | 100.0 |
| claude-3-sonnet | 2.6 | 15.4 | 59.0 | 38.5 | 56.4 |