notesum.ai

Published at November 12

cs.AI

cs.CL

Released Date: November 12, 2024

Authors: Qingyu Yin¹, Chak Tou Leong², Hongbo Zhang¹, Minjun Zhu, Hanqi Yan³, Qiang Zhang⁴, Yulan He³, Wenjie Li², Jun Wang⁵, Yue Zhang¹, Linyi Yang¹

Aff.: ¹Westlake University; ²The Hong Kong Polytechnic University; ³Kings College London; ⁴Zhejiang University; ⁵University College London

Method	LPD	Margin	Constraint	Constraint Type
DPO	$\beta\log\pi_{\theta}(y_{w}\|x)-\beta\log\pi_{\theta}(y_{l}\|x)$	$\gamma_{\text{ref}}$	0	-
SimPO	$\frac{\beta}{\|y_{w}\|}\log\pi_{\theta}(y_{w}\|x)-\frac{\beta}{\|y_{l}\|}\log\pi_{% \theta}(y_{l}\|x)$	$\gamma$ (a constant)	$0$	-
TDPO_i	$\beta\log\pi_{\theta}(y_{w}\|x)-\beta\log\pi_{\theta}(y_{l}\|x)$	$\gamma_{\text{ref}}$	$\delta_{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}{% \text{TDPO}_{i}}}(x,y_{w},y_{l})$	KL Divergence
FPO	$\frac{\beta}{\|y_{w}\|}\log\pi_{\theta}(y_{w}\|x)-\frac{\beta}{\|y_{l}\|}\log\pi_{% \theta}(y_{l}\|x))$	$\gamma_{\text{ref-LN}}$	$\delta_{\text{FPO}}(x,y_{w},y_{l})$	MSE