notesum.ai

Published at December 5

Challenges in Trustworthy Human Evaluation of Chatbots

cs.HC

Released Date: December 5, 2024

Authors: Wenting Zhao¹, Alexander M. Rush¹, Tanya Goyal¹

Aff.: ¹Cornell University

Arxiv: http://arxiv.org/pdf/2412.04363v1

Refer to caption

Model	Leaderboard Ranking
	Orig.	r=1	r=5	r=10
Llama-2-7b-chat	21	21	20_{$\uparrow$ 1}	21
Llama-2-13b-chat	39	39	41_{$\downarrow$ 2}	34_{$\uparrow$ 5}
Mistral-7b-instruct-v0.2	36	38_{$\downarrow$ 2}	38_{$\downarrow$ 2}	41_{$\downarrow$ 5}