notesum.ai

Published at November 9

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

cs.CL

cs.AI

Released Date: November 9, 2024

Authors: Xinghua Zhang¹, Haiyang Yu¹, Cheng Fu¹, Fei Huang¹, Yongbin Li¹

Aff.: ¹Tongyi Lab

Arxiv: http://arxiv.org/abs/2411.06208v1

Refer to caption

Model	Method	Trace		IFEval		CFBench
Model	Method	IF-S	IF-M	S-Acc	L-Acc	CSR	ISR	PSR
Qwen2-7B	Instruct	72.5	54.5	51.6	56.4	75.8	39.1	50.2
	SFT	76.0	56.1	52.3	54.2	77.8	40.4	52.9
	PPO	77.0	57.7	51.4	53.8	76.2	38.8	50.6
	DPO	79.0	67.2	52.7	58.2	80.0	45.1	57.9
	IOPO (Ours)_Improv.	82.0_↑3.0	68.9_↑1.7	59.9_↑7.2	63.6_↑5.4	80.7_↑0.7	47.0_↑1.9	58.7_↑0.8
Llama3.1-8B	Instruct	67.5	52.9	74.3	78.6	71.4	35.7	46.9
	SFT	75.5	62.9	71.0	74.1	78.4	43.2	54.7
	PPO	75.0	57.3	69.9	72.3	75.9	40.9	50.7
	DPO	79.0	69.2	71.5	76.5	80.8	48.1	59.8
	IOPO (Ours)_Improv.	81.5_↑2.5	70.7_↑1.5	78.2_↑6.7	81.0_↑4.5	81.8_↑1.0	49.9_↑1.8	61.1_↑1.3