notesum.ai
Published at December 10RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models
cs.CV
cs.AI
Released Date: December 10, 2024
Authors: Greg Heinrich1, Mike Ranzinger1, Hongxu, Yin, Yao Lu1, Jan Kautz1, Andrew Tao1, Bryan Catanzaro1, Pavlo Molchanov1
Aff.: 1NVIDIA

| Layers | Aggre- | Head | ADE20k | Depth | Surf | Overall |
| gation | Normals | |||||
| 31 | N/A | Linear | 52.47 | 82.9 | 57.0 | 61.215 |
| 7-15-23-31 | Sparse | Linear | 52.99 | 82.5 | 59.6 | 62.03 |
| (0-9)-(10-19)-(20-30)-31 | Dense | Linear | 52.96 | 82.7 | 59.5 | 62.03 |
| 15-31 | Sparse | Linear | 52.90 | 83.1 | 59.6 | 62.12 |
| (0-15)-(16-30)-31 | Dense | DPT | 54.27 | 85.4 | 60.7 | 63.65 |
| 15-31 | Sparse | DPT | 54.58 | 84.6 | 61.0 | 63.70 |
| 7-15-23-31 | Sparse | DPT | 55.19 | 85.9 | 61.6 | 64.46 |
| (0-9)-(10-19)-(20-30)-31 | Dense | DPT | 54.28 | 85.5 | 62.3 | 64.08 |
| 3-7-11-15-19-23-27-31 | Sparse | DPT | 54.42 | 86.7 | 62.8 | 64.58 |
| TextVQA | ChartQA | DocVQA | InfoVQA | OCRBench | ||
| Last | N/A | 63.6 | 23.4 | 47.0 | 33.8 | 42.0 |
| 7-15-23-31 | Sparse | 63.2 | 24.1 | 47.2 | 34.3 | 40.3 |
| (0-9)-(10-19)-(20-30)-31 | Dense | 63.5 | 23.1 | 47.0 | 33.5 | 40.2 |