notesum.ai
Published at December 3VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
cs.CV
cs.AI
cs.CL
Released Date: December 3, 2024
Authors: Xueqing Wu1, Yuheng Ding, Bingxuan Li, Pan Lu2, Da Yin, Kai-Wei Chang, Nanyun Peng
Aff.: 1University of California, Los Angeles; 2Stanford

| Total |
|
|
|
|||||||
| Questions | 1645 | 502 | 179 | 964 | ||||||
| - Incorrect answers | 502 | 502 | 0 | 0 | ||||||
| - Correct answers | 1143 | 0 | 179 | 964 | ||||||
| Unique images | 1587 | 494 | 178 | 940 | ||||||
| Steps per question | 3.4 | 3.4 | 3.6 | 3.4 | ||||||
| Steps in total | 5604 | 1713 | 650 | 3241 | ||||||
| - Incorrect steps | 1346 | 1046 | 300 | 0 | ||||||
| - Correct steps | 4258 | 667 | 350 | 3241 | ||||||
| Average question length | 33.5 | 34.7 | 35.5 | 32.5 | ||||||
| Average step length | 19.4 | 20.9 | 20.2 | 18.5 | ||||||
| Average critique length | 17.0 | 17.2 | 16.1 | - |