notesum.ai
Published at December 9VP-MEL: Visual Prompts Guided Multimodal Entity Linking
cs.CV
cs.CL
Released Date: December 9, 2024
Authors: Hongze Mi1, Jinyuan Li2, Xuying Zhang1, Haoran Cheng1, Jiahao Wang1, Di Sun3, Gang Pan2
Aff.: 1College of Intelligence and Computing, Tianjin University; 2School of New Media and Communication, Tianjin University; 3Tianjin University of Science and Technology

| Methods | VP-MEL | ||
| H@1 | H@3 | H@5 | |
| BLIP-2-xl (Li et al., 2023) | 15.86 | 35.41 | 45.32 |
| BLIP-2-xxl (Li et al., 2023) | 21.90 | 37.31 | 49.70 |
| mPLUG-Owl3-7b (Ye et al., 2023) | 29.46 | 30.45 | 48.94 |
| LLaVA-1.5-7b (Liu et al., 2024) | 43.20 | 64.35 | 65.71 |
| LLaVA-1.5-13b (Liu et al., 2024) | 32.93 | 65.56 | 66.92 |
| MiniGPT-4-7b (Zhu et al., 2024) | 28.10 | 33.53 | 37.31 |
| MiniGPT-4-13b (Zhu et al., 2024) | 37.61 | 37.61 | 40.03 |
| VELML (Zheng et al., 2022) | 22.51 | 37.61 | 43.35 |
| GHMFC (Wang et al., 2022a) | 25.53 | 41.39 | 48.94 |
| MIMIC (Luo et al., 2023) | 24.62 | 42.35 | 49.25 |
| MELOV (Song et al., 2024) | 26.44 | 42.75 | 51.51 |
| FBMEL(ours) | 48.34 | 67.53 | 77.50 |