notesum.ai
Published at November 14Cross-Modal Consistency in Multimodal Large Language Models
cs.CL
cs.AI
Released Date: November 14, 2024
Authors: Xiang Zhang1, Senyu Li2, Ning Shi2, Bradley Hauer2, Zijun Wu2, Grzegorz Kondrak2, Muhammad Abdul-Mageed3, Laks V. S. Lakshmanan
Aff.: 1University of Alberta; 2Alberta Machine Intelligence Institute, Dept. of Computing Science, University of Alberta; 3University of British Columbia

| Task | Modal | Acc | Consistency |
|---|---|---|---|
| MES(Easy) | Text | 0.44 | 0.72 |
| \cdashline2-3 | Image | 0.24 | |
| MES (Hard) | Text | 0.62 | 0.62 |
| \cdashline2-3 | Image | 0.28 | |
| LogicQA | Text | 0.64 | 0.64 |
| \cdashline2-3 | Image | 0.44 | |
| MMLU | Text | 1.00 | 0.74 |
| \cdashline2-3 | Image | 0.74 | |
| TU | Text | 0.93 | 0.10 |
| \cdashline2-3 | Image | 0.03 | |
| MR | Text | 0.40 | 0.92 |
| \cdashline2-3 | Image | 0.36 | |
| State Machine | Text | 0.34 | 0.67 |
| \cdashline2-3 | Image | 0.28 |