notesum.ai
Published at November 11ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
cs.CV
cs.AI
cs.LG
Released Date: November 11, 2024
Authors: Jiawei Fan1, Chao Li, Xiaolong Liu2, Anbang Yao1
Aff.: 1Intel Labs China; 2iMotion Automotive Technology

| Teacher | Student | Params (M) | FLOPs (G) | Accuracy (%) | |||
|---|---|---|---|---|---|---|---|
| T | S | T | S | Top-1 | Top-1 | ||
| Swin-L† (86.24) | MobileNet-V1 (72.10) | 196.53 | 4.23 | 34.04 | 0.58 | 75.15 | +3.05 |
| ResNet-50 (78.64) | 25.56 | 4.12 | 82.03 | +3.39 | |||
| ConvNeXt-T (82.14) | 28.59 | 4.46 | 84.16 | +2.02 | |||
| Mixer-S/16 (74.02) | 196.53 | 18.53 | 34.04 | 3.78 | 78.63 | +4.61 | |
| Mixer-B/16 (76.44) | 59.88 | 12.61 | 81.96 | +5.52 | |||
| ViT-S/16 (79.90) | 196.53 | 22.05 | 34.04 | 4.61 | 83.93 | +4.03 | |
| Swin-T (81.18) | 28.29 | 4.36 | 83.80 | +2.62 | |||
| ViT-B/16 (81.80) | 86.57 | 17.58 | 85.53 | +3.73 | |||
| BEiT-L/14‡ (88.58) | ResNet-50 (78.64) | 304.14 | 25.56 | 81.06 | 4.12 | 82.34 | +3.70 |
| Mixer-B/14 (76.62) | 59.88 | 16.45 | 82.89 | +6.27 | |||
| ViT-B/14 (82.02) | 86.57 | 23.09 | 86.43 | +4.41 | |||