notesum.ai
Published at October 18ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
cs.CL
cs.AI
cs.HC
68T50 (Primary) 68T37 (Secondary)
I.2.7; I.2.1
Released Date: October 18, 2024
Authors: Jingqi Zhou1, Sheng Wang1, Jingwei Dong2, Lei Li1, Jiahui Gao1, Lingpeng Kong1, Chuan Wu1
Aff.: 1The University of Hong Kong; 2Xi'an Jiaotong University

| Model | Method | Dataset | |||
|---|---|---|---|---|---|
| MME | MMMU | MathVista | Hallu. | ||
| Llama3-LLaVA- NeXT-8B | Direct | 61.5 | 41.8 | 37.1 | 45.8 |
| VDGD | 68.8 (+7.3) | 42.3 (+0.5) | 36.1 (-1.0) | 44.2 (-1.6) | |
| CCoT | 68.9 (+7.4) | 40.5 (-1.3) | 36.8 (-0.3) | 37.4 (-8.4) | |
| CoT | 58.8 (-2.7) | 41.5 (-0.3) | 35.9 (-1.2) | 43.1 (-2.7) | |
| ReAct | 68.5 (+7.0) | 46.7 (+4.9) | 31.7 (-5.4) | 43.6 (-2.2) | |
| ProReason | 71.5 (+10.0) | 52.5 (+10.7) | 38.8 (+1.7) | 50.9 (+5.1) | |
| Average | 66.30 | 44.22 | 36.06 | 44.16 | |
| GPT-4o-mini | Direct | 79.2 | 48.4 | 53.0 | 56.0 |
| VDGD | 82.3 (+3.1) | 51.4 (+3.0) | 51.2 (-1.8) | 52.4 (-3.6) | |
| CCoT | 80.8 (+1.6) | 54.2 (+5.8) | 53.6 (+0.6) | 56.7 (+0.7) | |
| CoT | 87.8 (+8.6) | 58.5 (+10.1) | 53.8 (+0.8) | 56.3 (+0.3) | |
| ReAct | 87.3 (+8.1) | 54.8 (+6.4) | 49.3 (-3.7) | 51.1 (-4.9) | |
| ProReason | 90.4 (+11.2) | 61.6 (+13.2) | 54.9 (+1.9) | 58.9 (+2.9) | |
| Average | 84.63 | 54.82 | 52.63 | 55.23 | |