notesum.ai
Published at November 19Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
cs.AI
cs.CL
stat.ML
Released Date: November 19, 2024
Authors: Shang Liu1, Yu Pan2, Guanting Chen3, Xiaocheng Li1
Aff.: 1Imperial College Business School, Imperial College London; 2Department of Intelligent Transportation, HKUST-GZ; 3Department of Statistics and Operations Research, University of North Carolina

| Model | Tied Ratio | Oracle CE Loss | ID Accuracy | OOD Accuracy | |||
|---|---|---|---|---|---|---|---|
| Mean | Std | Mean | Std | Mean | Std | ||
| Llama | 0% | 1.0421 | 0.0363 | 0.9224 | 0.0080 | 0.7661 | 0.0182 |
| 25% | 0.3327 | 0.0051 | 0.9341 | 0.0173 | 0.7672 | 0.0093 | |
| 50% | 0.4187 | 0.0043 | 0.9336 | 0.0014 | 0.7545 | 0.0082 | |
| 75% | 0.5339 | 0.0052 | 0.9268 | 0.0180 | 0.7749 | 0.0008 | |
| 100% | 0.6931 | 0.0017 | 0.3428 | 0.0677 | 0.4424 | 0.0393 | |
| Gemma | 0% | 6.4762 | 0.3392 | 0.9355 | 0.0041 | 0.8319 | 0.0080 |
| 25% | 0.6031 | 0.0019 | 0.9467 | 0.0117 | 0.8487 | 0.0100 | |
| 50% | 0.5775 | 0.0001 | 0.9526 | 0.0075 | 0.8277 | 0.0006 | |
| 75% | 0.6122 | 0.0006 | 0.9521 | 0.0069 | 0.8236 | 0.0084 | |
| 100% | 0.6931 | 0.0001 | 0.4814 | 0.0055 | 0.4928 | 0.0158 | |