notesum.ai

Published at December 6

Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies

cs.CL

Released Date: December 6, 2024

Authors: Recep Firat Cekinel, Pinar Karagoz, Cagri Coltekin

Arxiv: http://arxiv.org/pdf/2412.05155v1

Refer to caption

		MOCHEG				FACTIFY2
Models	Inputs	Support	Refute	NEI	F1-macro	Support	Refute	NEI	F1-macro
Qwen-7B	text	0.533	0.262	0.169	0.321	0.524	0.458	0.281	0.421
Mistral-7B	text	0.505	0.281	0.216	0.334	0.575	0.561	0.093	0.409
Gemma-2b	text	0.610	0.462	0.315	0.462	0.562	0.119	0.083	0.255
Qwen-VL	text + image	0.168	0.472	0.186	0.275	0.463	0.460	0.369	0.431
Idefics2-8b	text + image	0.619	0.547	0.385	0.517	0.586	0.644	0.303	0.511
PaliGemma-3b	text + image	0.222	0.347	0.449	0.339	0.149	0.139	0.186	0.158
LVLM4FV	text	0.575	0.542	0.439	0.519	0.593	0.581	0.560	0.578
LVLM4FV	text + image	0.578	0.569	0.457	0.535	0.678	0.605	0.508	0.597
MOCHEG	text + image	0.490	0.604	0.282	0.459	0.547	0.621	0.275	0.481