notesum.ai
Published at December 5Challenges in Trustworthy Human Evaluation of Chatbots
cs.HC
Released Date: December 5, 2024
Authors: Wenting Zhao1, Alexander M. Rush1, Tanya Goyal1
Aff.: 1Cornell University

| Model | Leaderboard Ranking | |||
|---|---|---|---|---|
| Orig. | r=1 | r=5 | r=10 | |
| Llama-2-7b-chat | 21 | 21 | 201 | 21 |
| Llama-2-13b-chat | 39 | 39 | 412 | 345 |
| Mistral-7b-instruct-v0.2 | 36 | 382 | 382 | 415 |