notesum.ai
Published at November 12Entropy Controllable Direct Preference Optimization
cs.LG
cs.AI
cs.CL
Released Date: November 12, 2024
Authors: Motoki Omura1, Yasuhiro Fujita2, Toshiki Kataoka2
Aff.: 1The University of Tokyo; 2Preferred Networks, Inc.

| GSM8K | HumanEval | MMLU-Pro | IFEval | |
| DPO () | 26.40 ±1.76 | 28.77 ±0.45 | 31.83 ±0.17 | 59.63 ±0.72 |
| \hdashlineH-DPO () | 27.77 ±1.39 | 30.70 ±0.39 | 32.37 ±0.03 | 60.17 ±0.34 |
| H-DPO () | 28.83 ±2.32 | 29.63 ±0.45 | 32.30 ±0.17 | 60.93 ±0.50 |
| H-DPO () | 28.66 ±1.23 | 27.77 ±0.67 | 31.93 ±0.19 | 59.90 ±0.59 |