notesum.ai
Published at November 10Learning Loss Landscapes in Preference Optimization
cs.LG
cs.AI
stat.ML
Released Date: November 10, 2024
Authors: Carlo Alfano, Silvia Sapora, Jakob Nicolaus Foerster, Patrick Rebeschini, Yee Whye Teh

| Value agent 1 | Value agent 2 | Noise | Shuffled | ORPO | Discovered |
|---|---|---|---|---|---|
| 3900 | 1700 | 0 | No | 327749 | 347363 |
| 3900 | 1700 | 0 | Yes | 242558 | 3485163 |
| 3900 | 1700 | 0.1 | No | 267562 | 383752 |