notesum.ai
Published at November 8Benchmarking Distributional Alignment of Large Language Models
cs.CL
cs.AI
Released Date: November 8, 2024
Authors: Nicole Meister1, Carlos Guestrin1, Tatsunori Hashimoto1
Aff.: 1Stanford University

| Model | |
| GPT-4 (V) | 0.204 0.004 |
| Anthropic Opus (V) | 0.219 0.005 |
| Llama 3 70B (V) | 0.226 0.004 |
| Anthropic Haiku (V) | 0.235 0.005 |
| GPT-4 (Seq) | 0.237 0.004 |
| Humans (V) | 0.247 0.004 |
| GPT-3.5-Turbo (V) | 0.259 0.005 |
| GPT-4 (TS-Log-p) | 0.260 0.004 |
| GPT-3.5-Turbo (Seq) | 0.278 0.005 |
| Anthropic Haiku (Seq) | 0.287 0.006 |
| GPT-3.5-Turbo (TS-Log-p) | 0.290 0.005 |
| Llama 3 70B (Seq) | 0.320 0.006 |
| Anthropic Opus (Seq) | 0.337 0.006 |
| GPT-3.5-Turbo (Log-p) | 0.462 0.007 |
| Llama 3 70B (TS-Log-p) | 0.460 0.007 |
| Llama 3 70B (Log-p) | 0.515 0.006 |
| GPT-4 (Log-p) | 0.582 0.006 |
| Discretization Error (Seq) | 0.126 0.006 |
| Uniform | 0.302 0.005 |