notesum.ai
Published at November 9IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
cs.CL
cs.AI
Released Date: November 9, 2024
Authors: Xinghua Zhang1, Haiyang Yu1, Cheng Fu1, Fei Huang1, Yongbin Li1
Aff.: 1Tongyi Lab

| Model | Method | Trace | IFEval | CFBench | ||||
| IF-S | IF-M | S-Acc | L-Acc | CSR | ISR | PSR | ||
| Qwen2-7B | Instruct | 72.5 | 54.5 | 51.6 | 56.4 | 75.8 | 39.1 | 50.2 |
| SFT | 76.0 | 56.1 | 52.3 | 54.2 | 77.8 | 40.4 | 52.9 | |
| PPO | 77.0 | 57.7 | 51.4 | 53.8 | 76.2 | 38.8 | 50.6 | |
| DPO | 79.0 | 67.2 | 52.7 | 58.2 | 80.0 | 45.1 | 57.9 | |
| IOPO (Ours)Improv. | 82.0↑3.0 | 68.9↑1.7 | 59.9↑7.2 | 63.6↑5.4 | 80.7↑0.7 | 47.0↑1.9 | 58.7↑0.8 | |
| Llama3.1-8B | Instruct | 67.5 | 52.9 | 74.3 | 78.6 | 71.4 | 35.7 | 46.9 |
| SFT | 75.5 | 62.9 | 71.0 | 74.1 | 78.4 | 43.2 | 54.7 | |
| PPO | 75.0 | 57.3 | 69.9 | 72.3 | 75.9 | 40.9 | 50.7 | |
| DPO | 79.0 | 69.2 | 71.5 | 76.5 | 80.8 | 48.1 | 59.8 | |
| IOPO (Ours)Improv. | 81.5↑2.5 | 70.7↑1.5 | 78.2↑6.7 | 81.0↑4.5 | 81.8↑1.0 | 49.9↑1.8 | 61.1↑1.3 | |