notesum.ai
Published at November 20Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
cs.AI
cs.CV
Released Date: November 20, 2024
Authors: Shantanu Jaiswal1, Debaditya Roy2, Basura Fernando2, Cheston Tan3
Aff.: 1Carnegie Mellon University; 2IHPC, A*STAR Singapore; 3Centre for Frontier AI Research, A*STAR Singapore

| Model | Int. | Seq. | Pred. | Feas. | Avg. |
|---|---|---|---|---|---|
| LRR*[5] | 73.7 | 71.0 | 71.3 | 65.1 | 70.3 |
| LRR (w/o surrogate) | 54.5 | 48.7 | 44.3 | 45.5 | 48.2 |
| All-in-One [72] | 47.5 | 50.8 | 47.7 | 44.0 | 47.5 |
| Temp[ATP][7] | 50.6 | 52.8 | 49.3 | 40.6 | 48.3 |
| MIST [18] | 55.5 | 54.2 | 54.2 | 44.4 | 51.1 |
| InternVideo (8) [75] | 62.7 | 65.6 | 54.9 | 51.9 | 58.7 |
| SeViLA-BLIP2 [86] | 63.7 | 70.4 | 63.1 | 62.4 | 64.9 |
| Concat-Att-4L | 68.1 | 71.4 | 66.6 | 55.2 | 65.3 |
| Cross-Att-4L | 67.5 | 72.1 | 64.4 | 58.5 | 65.6 |
| IPRM | 71.8 | 77.7 | 71.0 | 59.1 | 69.9 |