notesum.ai

Published at October 30

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

cs.CV

cs.AI

cs.CL

cs.LG

Released Date: October 30, 2024

Authors: Junjie Wu¹, Tsz Ting Chung¹, Kai Chen¹, Dit-Yan Yeung¹

Aff.: ¹The Hong Kong University of Science and Technology

Arxiv: http://arxiv.org/abs/2410.23114v1

Refer to caption

Method	GPT-4 Judge						NLI Judge
	Overall		Object		Relation		Overall
	$\text{Hallu}_{\text{I}}\downarrow$	$\text{Hallu}_{\text{Q}}\downarrow$	$\text{Hallu}_{\text{I}}\downarrow$	$\text{Hallu}_{\text{Q}}\downarrow$	$\text{Hallu}_{\text{I}}\downarrow$	$\text{Hallu}_{\text{Q}}\downarrow$	$\text{Hallu}_{\text{I}}\downarrow$	$\text{Hallu}_{\text{Q}}\downarrow$
MiniGPT-4	53.60	51.79	28.32	26.77	25.25	24.98	55.61	53.36
InstructBLIP	46.68	45.57	22.19	20.88	24.50	24.69	58.25	55.56
LLaVA	42.34	41.30	19.88	18.50	22.46	22.80	54.49	51.51
Shikra	42.20	41.76	18.55	17.54	23.65	24.22	56.46	53.98
LLaVA-1.5	40.66	39.10	18.63	17.28	22.03	21.82	54.14	51.67
LLaMA-3.2	40.16	38.95	22.30	21.08	17.86	17.87	48.46	45.64
InternLM2	38.83	37.54	18.25	17.50	20.58	20.04	54.41	52.08