notesum.ai
Published at November 21NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews
cs.CL
cs.AI
Released Date: November 21, 2024
Authors: Michael Lu1, Hyundong Justin Cho, Weiyan Shi2, Jonathan May3, Alexander Spangher3
Aff.: 1University of California, Berkeley; 2Northeastern University; 3University of Southern California

| Exact Match | Info. | Motivation | Style | Discourse | Context | |
|---|---|---|---|---|---|---|
| Baseline-LLM | 3.9% | 4.4% | 4.7% | 11.9% | 36.2% | 53.0% |
| Chain-of-Thought (CoT) | 4.5% | 3.6% | 5.2% | 12.8% | 37.0% | 56.9% |
| LLM w. Outline | 3.7% | 3.8% | 4.1% | 9.6% | 36.2% | 46.6% |
| Outline-CoT | 3.6% | 3.9% | 4.3% | 8.3% | 29.9% | 43.1% |
| Human | 8.2% | 17.5% | 35.4% | 40.2% | 54.5% | 60.3% |