notesum.ai

Published at December 3

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

cs.CV

cs.AI

cs.CL

Released Date: December 3, 2024

Authors: Xueqing Wu¹, Yuheng Ding, Bingxuan Li, Pan Lu², Da Yin, Kai-Wei Chang, Nanyun Peng

Aff.: ¹University of California, Los Angeles; ²Stanford

Arxiv: http://arxiv.org/pdf/2412.02172v1

Total

Outcome

Error

Process

Error

Questions

1645

502

179

964

- Incorrect answers

502

- Correct answers

1143

179

964

Unique images

1587

494

178

940

Steps per question

3.4

3.6

3.4

Steps in total

5604

1713

650

3241

- Incorrect steps

1346

1046

300

- Correct steps

4258

667

350

3241

Average question length

33.5

34.7

35.5

32.5

Average step length

19.4

20.9

20.2

18.5

Average critique length

17.0

17.2

16.1