notesum.ai
Published at October 30Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
cs.CV
cs.AI
cs.CL
cs.LG
Released Date: October 30, 2024
Authors: Junjie Wu1, Tsz Ting Chung1, Kai Chen1, Dit-Yan Yeung1
Aff.: 1The Hong Kong University of Science and Technology

| Method | GPT-4 Judge | NLI Judge | ||||||
|---|---|---|---|---|---|---|---|---|
| Overall | Object | Relation | Overall | |||||
| MiniGPT-4 | 53.60 | 51.79 | 28.32 | 26.77 | 25.25 | 24.98 | 55.61 | 53.36 |
| InstructBLIP | 46.68 | 45.57 | 22.19 | 20.88 | 24.50 | 24.69 | 58.25 | 55.56 |
| LLaVA | 42.34 | 41.30 | 19.88 | 18.50 | 22.46 | 22.80 | 54.49 | 51.51 |
| Shikra | 42.20 | 41.76 | 18.55 | 17.54 | 23.65 | 24.22 | 56.46 | 53.98 |
| LLaVA-1.5 | 40.66 | 39.10 | 18.63 | 17.28 | 22.03 | 21.82 | 54.14 | 51.67 |
| LLaMA-3.2 | 40.16 | 38.95 | 22.30 | 21.08 | 17.86 | 17.87 | 48.46 | 45.64 |
| InternLM2 | 38.83 | 37.54 | 18.25 | 17.50 | 20.58 | 20.04 | 54.41 | 52.08 |