notesum.ai
Published at November 10Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
cs.AI
Released Date: November 10, 2024
Authors: Yu Gu1, Boyuan Zheng1, Boyu Gou1, Kai Zhang1, Cheng Chang2, Sanjari Srivastava2, Yanan Xie2, Peng Qi2, Huan Sun1, Yu Su1
Aff.: 1The Ohio State University; 2Orby AI

| Benchmark | Observation | Method | Completion Rate | Success Rate |
|---|---|---|---|---|
| VisualWebArena | Screenshot+SoM | Gemini-1.5-Pro + Reactive (Koh et al., 2024a) | - | 12.0% |
| GPT-4 + Reactive (Koh et al., 2024a) | - | 16.4% | ||
| GPT-4o + Reactive (Koh et al., 2024a) | - | 17.7%† | ||
| GPT-4o + Tree Search (Koh et al., 2024b) | - | 26.4% | ||
| GPT-4o + WebDreamer | - | 23.6% (\faArrowUp33.3%) | ||
| Mind2Web-live | HTML | GPT-4 + Reactive (Pan et al., 2024b) | 48.8% | 23.1% |
| Claude-3-Sonnet + Reactive (Pan et al., 2024b) | 47.9% | 22.1% | ||
| Gemini-1.5-Pro + Reactive (Pan et al., 2024b) | 44.6% | 22.3% | ||
| GPT-4-turbo + Reactive (Pan et al., 2024b) | 44.3% | 21.1% | ||
| GPT-3.5-turbo + Reactive (Pan et al., 2024b) | 40.2% | 16.5% | ||
| GPT-4o + Reactive (Pan et al., 2024b) | 47.6% | 22.1% | ||
| GPT-4o + WebDreamer | 49.9% | 25.0% (\faArrowUp13.1%) |