notesum.ai

Published at November 5

V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization

cs.CV

cs.AI

Released Date: November 5, 2024

Authors: Yuxi Xie¹, Guanzhen Li¹, Xiao Xu¹, Min-Yen Kan¹

Aff.: ¹National University of Singapore

Arxiv: http://arxiv.org/abs/2411.02712v1

Refer to caption

Synthetic Augmented Data
Approach	F1 Score				Yes Ratio
Approach	F1_{R $\uparrow$}	F1_{P $\uparrow$}	F1_{A $\uparrow$}	F1_$\uparrow$	Yes Ratio
SFT	$89.69$	$86.83$	$81.80$	$85.98$	$54.20$
HA-DPO	$\mathbf{90.25}$	$87.81$	$82.54$	$86.87$	$51.03$
DPO	$88.34$	$87.05$	$83.96$	$86.42$	$44.22$
V-DPO	$89.57$	$87.62$	$83.77$	$86.92$ _↑0.94	$47.43$
RLHF-V
DPO	$89.69$	$87.81$	$84.03$	$87.12$	$47.88$
V-DPO	$89.90$	$\mathbf{87.91}$	$\mathbf{84.05}$	$\mathbf{87.22}$ _↑1.24	$48.66$