notesum.ai
Published at November 7Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
cs.AI
cs.MA
Released Date: November 7, 2024
Authors: Adam Fourney1, Gagan Bansal1, Hussein Mozannar1, Cheng Tan1, Eduardo Salinas1, Erkang, Zhu, Friederike Niedtner1, Grace Proebsting1, Griffin Bassman1, Jack Gerrits1, Jacob Alber1, Peter Chang1, Ricky Loynd1, Robert West1, Victor Dibia1, Ahmed Awadallah1, Ece Kamar1, Rafah Hosn1, Saleema Amershi1
Aff.: 1Microsoft Research

| Method | GAIA | AssistantBench (EM) | AssistantBench (accuracy) | WebArena |
| omne v0.1 (GPT-4o, o1) | 40.535.6 | – | – | – |
| Trase Agent v0.2 (GPT-4o, o1, Gemini) | 39.535.5 | – | – | – |
| Multi Agent (NA) | 38.875.5 | – | – | – |
| das agent v0.4 (GPT-4o) | 38.215.5 | – | – | – |
| Sibyl (GPT-4o) [56] | 34.555.4 | – | – | – |
| HF Agents (GPT-4o) | 33.335.3 | – | – | – |
| FRIDAY (GPT-4T) [61] | 24.254.8 | – | – | – |
| GPT-4 + plugins [29] | 14.604.0 | – | – | – |
| SPA CB (Claude) [71] | – | 13.85.0 | 26.46.4 | – |
| SPA CB (GPT-4T) [71] | – | 9.94.3 | 25.26.3 | – |
| Infogent (GPT-4o) | – | 5.53.3 | 14.55.1 | – |
| Jace.AI (NA) | – | – | – | 57.13.4 |
| WebPilot (GPT-4o) [75] | – | – | – | 37.23.3 |
| AWM (GPT-4) [57] | – | – | – | 35.53.3 |
| SteP (GPT-4) [49] | – | – | – | 33.53.2 |
| BrowserGym (GPT-4o) [10] | – | – | – | 23.52.9 |
| GPT-4 | 6.672.8[29] | 6.1 3.5[71] | 16.5 5.4[71] | 14.92.4[79] |
| Human | 92.003.1 | – | – | 78.22.8 |
| Magentic-One (GPT-4o) | 32.335.3 | 11.0 4.6 | 25.3 6.3 | 32.83.2 |
| Magentic-One (GPT-4o, o1) | 38.005.5 | 13.3 4.9 | 27.7 6.5 | * |