notesum.ai
Published at November 20AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
cs.AI
cs.CL
Released Date: November 20, 2024
Authors: Gaurav Verma1, Rachneet Kaur2, Nishan Srishankar2, Zhen Zeng2, Tucker Balch2, Manuela Veloso2
Aff.: 1Georgia Institute of Technology; 2J.P. Morgan AI Research

| Type | Model | Human Trajectories | Live Environment | |||
| Ele. Acc. | Op. F1 | Step SR | Overall SR | Overall SR | ||
| Proprietary Models | ||||||
| Baseline | SeeAct (GPT-4o) | 56.03 | 57.17 | 52.17 | 18.75 | 17.56 |
| Adapted | SeeAct + 1-ICMD | 59.15 | 63.18 | 55.27 | 22.42 | 21.36 |
| Baseline | SeeAct* (GPT-4o) | 57.52 | 59.16 | 53.16 | 18.78 | 18.04 |
| Adapted | SeeAct* + 1-ICMD | 61.46 | 64.12 | 56.72 | 23.86 | 23.15 |
| Open-weights Models | ||||||
| Baseline | CogAgent-FT | 52.31 | 55.64 | 48.70 | 08.78 | 06.43 |
| CogAgent-FT (DE) | 48.62 | 51.71 | 44.81 | 06.81 | 05.11 | |
| Adapted | CogAgent-FOMAML | 57.20 | 59.14 | 51.29 | 11.01 | 08.47 |