notesum.ai
Published at November 14SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
cs.CV
cs.AI
cs.LG
68T07
I.2.10
Released Date: November 14, 2024
Authors: Shravan Venkatraman1, Jaskaran Singh Walia1, Joe Dhanith P R1
Aff.: 1Vellore Institute of Technology, Chennai, India

| Backbone | CIFAR-10 | GTSRB | NCT-CRC-HE-100K | NWPU-RESISC45 | PlantVillage |
|---|---|---|---|---|---|
| DenseNet201 | 0.5427 | 0.9862 | 0.9214 | 0.4493 | 0.8725 |
| Vgg16 | 0.5345 | 0.8180 | 0.8234 | 0.4114 | 0.7064 |
| Vgg19 | 0.5307 | 0.7551 | 0.8178 | 0.3844 | 0.6811 |
| DenseNet121 | 0.5290 | 0.9813 | 0.9247 | 0.4381 | 0.8321 |
| AlexNet | 0.6126 | 0.9059 | 0.8743 | 0.4397 | 0.7684 |
| Inception | 0.7734 | 0.8934 | 0.8707 | 0.8707 | 0.8216 |
| ResNet | 0.9172 | 0.9134 | 0.9478 | 0.9103 | 0.8905 |
| MobileNet | 0.9169 | 0.3006 | 0.4965 | 0.1667 | 0.2213 |
| ViT - S | 0.8465 | 0.8542 | 0.8234 | 0.6116 | 0.8654 |
| ViT - L | 0.8637 | 0.8613 | 0.8345 | 0.8358 | 0.8842 |
| MNASNet1_0 | 0.1032 | 0.0024 | 0.0212 | 0.0011 | 0.0049 |
| ShuffleNet_V2_x1_0 | 0.3523 | 0.4244 | 0.4598 | 0.1808 | 0.3190 |
| SqueezeNet1_0 | 0.4328 | 0.8392 | 0.7843 | 0.3913 | 0.6638 |
| GoogLeNet | 0.4954 | 0.9455 | 0.8631 | 0.3720 | 0.7726 |
| Proposed | 0.9574 | 0.9958 | 0.9861 | 0.9549 | 0.9772 |