notesum.ai
Published at October 23TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
cs.AI
cs.HC
Released Date: October 23, 2024
Authors: Yuxuan Xie1, Tianhua Li1, Wenqi Shao2, Kaipeng Zhang2
Aff.: 1OpenGV Lab, Shanghai Artificial Intelligence Laboratory and School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; 2OpenGV Lab, Shanghai Artificial Intelligence Laboratory

| Model | Original Score | TP-Eval Score | #Improved Task | Ratio |
|---|---|---|---|---|
| LLaVA-1.5-7B | 50.4 | 54.4 | 32 | 25.1% |
| DeepSeek-VL-7B | 55.2 | 57.3 | 21 | 23.3% |
| Mini-InternVL-Chat-4B-V1-5 | 54.6 | 56.9 | 16 | 40.4% |