notesum.ai
Published at December 3UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices
cs.CV
Released Date: December 3, 2024

| Model | Base | Top-1 acc. | Top-5 acc. | Throughput (images/s) | FLOPs | Params | |
| architecture | () | () | GPU | CPU | (M) | (M) | |
| EfficientViT-M0 | Transformer | 63.2 | 85.4 | 64293 | 450 | 79 | 2.3 |
| UniForm-t | Transformer | 66.0 | 86.6 | 77625 | 544 | 74 | 1.8 |
| MobileNetV3-small | CNN | 67.4 | 87.4 | 41965 | 360 | 57 | 2.5 |
| EfficientViT-M1 | Transformer | 68.4 | 88.7 | 47045 | 220 | 167 | 3.0 |
| MobileViT-XXS | Transformer | 69.0 | 88.9 | 9663 | 59 | 410 | 1.3 |
| ShuffleNetV2 1.0x | CNN | 69.4 | 88.9 | 27277 | 138 | 146 | 2.3 |
| UniForm-s | Transformer | 70.1 | 89.3 | 50582 | 231 | 164 | 2.4 |
| EdgeNeXt-XXS | Both | 71.2 | - | 13051 | 121 | 261 | 1.3 |
| MobileOne-S0 | CNN | 71.4 | 89.8 | 20642 | 26 | 275 | 2.1 |
| Mixer-B/16 | MLP | 71.7 | - | 2057 | 6 | 12610 | 59.8 |
| RepVGG-A0 | CNN | 72.4 | - | 19450 | 61 | 1366 | 8.3 |
| EfficientViT-M3 | Transformer | 73.4 | 91.4 | 34427 | 166 | 263 | 6.9 |
| ViG-Ti | GNN | 73.9 | 92.0 | 1406 | 6 | 1300 | 7.1 |
| UniForm-m | Transformer | 74.1 | 91.9 | 36507 | 174 | 251 | 5.6 |
| RepVGG-A1 | CNN | 74.4 | - | 14155 | 39 | 2362 | 12.7 |
| DeiT-Tiny (distilled) | Transformer | 74.5 | - | 13785 | 63 | 1085 | 5.9 |
| MobileViT-XS | Transformer | 74.7 | 92.3 | 6098 | 13 | 986 | 2.3 |
| ShuffleNetV2 2.0x | CNN | 74.9 | 92.4 | 12910 | 67 | 591 | 7.4 |
| EdgeNeXt-XS | Both | 75.0 | - | 8312 | 69 | 538 | 2.3 |
| RepVGG-B0 | CNN | 75.1 | - | 10868 | 30 | 15824 | 14.3 |
| MobileNetV3-large | CNN | 75.2 | 91.3 | 14798 | 69 | 217 | 5.4 |
| MobileOne-S1 | CNN | 75.9 | 92.5 | 12150 | 22 | 825 | 4.8 |
| ConvNeXtV2-Atto | CNN | 76.2 | 93.0 | 9120 | 73 | 552 | 3.7 |
| Mixer-L/16 | MLP | 76.4 | - | 688 | 2 | 44570 | 208.2 |
| RepVGG-A2 | CNN | 76.4 | - | 8483 | 20 | 5123 | 25.4 |
| UniForm-l | Transformer | 76.7 | 93.2 | 25356 | 113 | 467 | 10.0 |