notesum.ai
Published at November 2Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
cs.CL
cs.AI
Released Date: November 2, 2024
Authors: Seonil Son1, Ju-Min Oh1, Heegon Jin1, Cheolhun Jang1, Jeongbeom Jeong1, Kuntae Kim1
Aff.: 1NCSOFT, Republic of Korea

| Spearman corr. () | = 50 | 100 | 250 | 475 | 500 |
|---|---|---|---|---|---|
| anchored (4o) | 0.895 | 0.935 | 0.963 | 0.966 | 0.964 |
| tournament (4o) | 0.905 | 0.940 | 0.960 | 0.970 | 0.970 |
| anchored (4o-mini) | 0.895 | 0.908 | 0.917 | 0.916 | 0.912 |
| tournament (4o-mini) | 0.901 | 0.919 | 0.931 | 0.933 | 0.933 |