notesum.ai
Published at October 28AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
cs.CV
cs.AI
cs.GR
cs.LG
Released Date: October 28, 2024
Authors: Han Bao1, Yue Huang1, Yanbo Wang2, Jiayi Ye2, Xiangqi Wang1, Xiuying Chen2, Mohamed Elhoseiny3, Xiangliang Zhang1
Aff.: 1University of Notre Dame; 2MBZUAI; 3KAUST

| User Input | Easy | Medium | Hard |
|---|---|---|---|
| Basic. | 90.59% | 75.90% | 63.33% |
| Spatial. | 82.46% | 69.14% | 63.00% |
| Seman. | 91.28% | 79.84% | 74.52% |
| Reason. | 86.50% | 74.67% | 68.97% |
| Atmos. | 94.76% | 83.33% | 75.66% |