notesum.ai

Published at November 25

Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines

cs.CL

Released Date: November 25, 2024

Authors: Zi-Ao Ma¹, Tian Lan¹, Rong-Cheng Tu², Yong Hu³, Heyan Huang¹, Xian-Ling Mao¹

Aff.: ¹School of Computer Science and Technology, Beijing Institute of Technology, China; ²Nanyang Technological University, Singapore; ³WeChat AI, Tencent Inc., China

Arxiv: http://arxiv.org/abs/2411.16365v1

Model Type	Approach	Model	Text-modal Metrics				Multi-modal Metrics				Overall
Model Type	Approach	Model	Flu.	Rel.	CP.	Faith.	Coher.	Help.	Ref.	Recall	Overall
LLMs	Separate	GPT-4o	0.80	0.82	0.69	0.89	0.53	0.44	0.43	0.43	0.63
	Single	GPT-4o	0.79	0.81	0.76	0.91	0.69	0.57	0.31	0.57	0.68
		Llama-3.1-70B-Instruct	0.75	0.79	0.75	0.85	0.57	0.47	0.18	0.22	0.57
		Qwen2.5-72B-Instruct	0.79	0.83	0.77	0.87	0.62	0.52	0.35	0.52	0.66
		Llama-3.1-8B-Instruct	0.74	0.77	0.72	0.83	0.49	0.38	0.28	0.62	0.60
		Qwen2.5-7B-Instruct	0.71	0.73	0.73	0.91	0.52	0.41	0.22	0.63	0.61
		Average	0.75	0.79	0.75	0.87	0.56	0.45	0.27	0.51	0.62
	Multi	GPT-4o	0.78	0.76	0.68	0.87	0.68	0.60	0.81	0.97	0.77
		Llama-3.1-70B-Instruct	0.73	0.75	0.72	0.84	0.68	0.55	0.77	0.97	0.75
		Qwen2.5-72B-Instruct	0.77	0.77	0.72	0.85	0.69	0.58	0.78	0.97	0.77
		Llama-3.1-8B-Instruct	0.72	0.73	0.72	0.82	0.66	0.55	0.75	0.93	0.74
		Qwen2.5-7B-Instruct	0.74	0.74	0.74	0.83	0.64	0.55	0.76	0.95	0.74
		Average	0.75	0.75	0.72	0.84	0.67	0.57	0.77	0.96	0.75
MLLMs	Single	GPT-4o	0.78	0.82	0.76	0.90	0.64	0.53	0.24	0.60	0.66
		Llama-3.2-90B-V-Instruct	0.77	0.68	0.64	0.76	0.51	0.34	0.10	0.01	0.48
		Qwen2-VL-72B-Instruct	0.77	0.78	0.72	0.88	0.43	0.32	0.15	0.16	0.53
		Llama-3.2-11B-V-Instruct	0.77	0.64	0.60	0.68	0.26	0.21	0.02	0.02	0.40
		Qwen2-VL-7B-Instruct	0.66	0.64	0.71	0.83	0.39	0.30	0.23	0.15	0.49
		Average	0.75	0.71	0.69	0.81	0.53	0.43	0.22	0.19	0.54
	Multi	GPT-4o	0.77	0.78	0.72	0.86	0.65	0.56	0.81	0.97	0.76
		Llama-3.2-90B-V-Instruct	0.72	0.70	0.67	0.74	0.55	0.43	0.62	0.94	0.67
		Qwen2-VL-72B-Instruct	0.75	0.74	0.71	0.80	0.63	0.52	0.73	0.94	0.73
		Llama-3.2-11B-V-Instruct	0.72	0.69	0.70	0.69	0.47	0.34	0.34	0.81	0.59
		Qwen2-VL-7B-Instruct	0.70	0.75	0.73	0.77	0.54	0.42	0.51	0.94	0.67
		Average	0.73	0.73	0.70	0.77	0.57	0.46	0.62	0.92	0.69