notesum.ai

Published at November 19

Reward Modeling with Ordinal Feedback: Wisdom of the Crowd

cs.AI

cs.CL

stat.ML

Released Date: November 19, 2024

Authors: Shang Liu¹, Yu Pan², Guanting Chen³, Xiaocheng Li¹

Aff.: ¹Imperial College Business School, Imperial College London; ²Department of Intelligent Transportation, HKUST-GZ; ³Department of Statistics and Operations Research, University of North Carolina

Arxiv: http://arxiv.org/abs/2411.12843v1

Model	Tied Ratio	Oracle CE Loss		ID Accuracy		OOD Accuracy
Model	Tied Ratio	Mean	Std	Mean	Std	Mean	Std
Llama	0%	1.0421	0.0363	0.9224	0.0080	0.7661	0.0182
	25%	0.3327	0.0051	0.9341	0.0173	0.7672	0.0093
	50%	0.4187	0.0043	0.9336	0.0014	0.7545	0.0082
	75%	0.5339	0.0052	0.9268	0.0180	0.7749	0.0008
	100%	0.6931	0.0017	0.3428	0.0677	0.4424	0.0393
Gemma	0%	6.4762	0.3392	0.9355	0.0041	0.8319	0.0080
	25%	0.6031	0.0019	0.9467	0.0117	0.8487	0.0100
	50%	0.5775	0.0001	0.9526	0.0075	0.8277	0.0006
	75%	0.6122	0.0006	0.9521	0.0069	0.8236	0.0084
	100%	0.6931	0.0001	0.4814	0.0055	0.4928	0.0158