notesum.ai

Published at October 23

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

cs.AI

cs.HC

Released Date: October 23, 2024

Authors: Yuxuan Xie¹, Tianhua Li¹, Wenqi Shao², Kaipeng Zhang²

Aff.: ¹OpenGV Lab, Shanghai Artificial Intelligence Laboratory and School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; ²OpenGV Lab, Shanghai Artificial Intelligence Laboratory

Arxiv: https://arxiv.org/abs/2410.18071v1

Model	Original Score	TP-Eval Score	#Improved Task	Ratio
LLaVA-1.5-7B	50.4	54.4	32	25.1%
DeepSeek-VL-7B	55.2	57.3	21	23.3%
Mini-InternVL-Chat-4B-V1-5	54.6	56.9	16	40.4%