notesum.ai
Published at October 22JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
cs.LG
cs.AI
cs.NE
Released Date: October 22, 2024
Authors: Shota Onohara1, Atsuyuki Miyai1, Yuki Imajuku1, Kazuki Egashira1, Jeonghun Baek1, Xiang Yue2, Graham Neubig2, Kiyoharu Aizawa1
Aff.: 1The University of Tokyo; 2Carnegie Mellon University

| Model | |||
|---|---|---|---|
| LLaVA-1.6-13B | 26.4 | 31.9 (+5.5) | 29.2 (+2.8) |
| Phi-3.5v | 39.2 | 33.6 (-5.6) | 31.1 (-8.1) |
| LLaVA-CALM2 | 29.4 | 28.3 (-1.1) | 31.4 (+2.0) |
| CogVLM2-19B | 32.8 | 31.9 (-0.9) | 34.4 (+1.6) |
| EvoVLM JP v2 | 30.0 | 30.8 (+0.8) | 28.6 (-1.4) |
| InternVL2-8B | 43.9 | 38.3 (-5.6) | 37.2 (-6.7) |
| LLaVA-1.6-34B | 43.6 | 40.8 (-2.8) | 38.9 (-4.7) |
| LLaVA-OV-7B | 45.0 | 38.3 (-6.7) | 35.6 (-9.4) |