notesum.ai

Published at December 9

Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels

cs.CV

Released Date: December 9, 2024

Authors: Weijie Tu¹, Weijian Deng¹, Dylan Campbell¹, Yu Yao², Jiyang Zheng², Tom Gedeon³, Tongliang Liu²

Aff.: ¹Australian National University; ²University of Sydney; ³Curtin University

Arxiv: http://arxiv.org/pdf/2412.06461v1

Refer to caption

Method	MCVQ		VQA		Average
Method	AI2D	MMMU	TextVQA	ChartQA	Average
AoL	$0.42$	$0.43$	$0.57$	$0.64$	$0.52$
NLL_min	$0.29$	$0.59$	$0.97$	$0.99$	$0.71$
NLL_avg	$0.26$	$0.28$	$0.98$	$0.98$	$0.63$
Ent_max	$0.37$	$0.62$	$0.97$	$0.96$	$0.73$
Ent_avg	$0.11$	$0.38$	$0.98$	$0.98$	$0.63$
Sample_BLEU	$0.63$	$0.73$	$0.76$	$0.64$	$0.69$
ATC	$0.12$	$0.59$	$0.85$	$0.84$	$0.60$