notesum.ai
Published at October 30Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
cs.CV
cs.AI
Released Date: October 30, 2024
Authors: Wei Dong1, Yuan Sun1, Yiting Yang1, Xing Zhang1, Zhijun Lin2, Qingsen Yan2, Haokui Zhang2, Peng Wang3, Yang Yang3, Hengtao Shen4
Aff.: 1College of Information and Control Engineering, Xi'an University of Architecture and Technology; 2School of Computer Science, Northwestern Polytechnical University; 3School of Computer Science and Engineering, University of Electronic Science and Technology of China; 4School of Computer Science and Technology, Tongji University

| Natural | Specialized | Structed | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
CIFAR-100 |
Caltech101 |
DTD |
Flowers102 |
Pets |
SVNH |
Sun397 |
Mean |
Camelyon |
EuroSAT |
Resisc45 |
Retinopathy |
Mean |
Clevr-Count |
Clevr-Dist |
DMLab |
KITTI-Dist |
dSpr-Loc |
dSpr-Ori |
sNORB-Azim |
sNORB-Ele |
Mean |
Mean Total |
Params.(M) |
|
| Full fine-tuning | 68.9 | 87.7 | 64.3 | 97.2 | 86.9 | 87.4 | 38.8 | 75.9 | 79.7 | 95.7 | 84.2 | 73.9 | 83.4 | 56.3 | 58.6 | 41.7 | 65.5 | 57.5 | 46.7 | 25.7 | 29.1 | 47.6 | 65.6 | 85.80 |
| Linear probing | 63.4 | 85.0 | 63.2 | 97.0 | 86.3 | 36.6 | 51.0 | 68.9 | 78.5 | 87.5 | 68.6 | 74.0 | 77.2 | 34.3 | 30.6 | 33.2 | 55.4 | 12.5 | 20.0 | 9.6 | 19.2 | 26.9 | 52.9 | 0.04 |
| Bias [23] | 72.8 | 87.0 | 59.2 | 97.5 | 85.3 | 59.9 | 51.4 | 73.3 | 78.7 | 91.6 | 72.9 | 69.8 | 78.3 | 61.5 | 55.6 | 32.4 | 55.9 | 66.6 | 40.0 | 15.7 | 25.1 | 44.1 | 62.1 | 0.14 |
| VPT-Shallow [24] | 77.7 | 86.9 | 62.6 | 97.5 | 87.3 | 74.5 | 51.2 | 76.8 | 78.2 | 92.0 | 75.6 | 72.9 | 79.7 | 50.5 | 58.6 | 40.5 | 67.1 | 68.7 | 36.1 | 20.2 | 34.1 | 47.0 | 64.9 | 0.11 |
| VPT-Deep [24] | 78.8 | 90.8 | 65.8 | 98.0 | 88.3 | 78.1 | 49.6 | 78.5 | 81.8 | 96.1 | 83.4 | 68.4 | 82.4 | 68.5 | 60.0 | 46.5 | 72.8 | 73.6 | 47.9 | 32.9 | 37.8 | 55.0 | 69.4 | 0.60 |
| Adapter [22] | 69.2 | 90.1 | 68.0 | 98.8 | 89.9 | 82.8 | 54.3 | 79.0 | 84.0 | 94.9 | 81.9 | 75.5 | 84.1 | 80.9 | 65.3 | 48.6 | 78.3 | 74.8 | 48.5 | 29.9 | 41.6 | 58.5 | 71.4 | 0.16 |
| LORA [1] | 67.1 | 91.4 | 69.4 | 98.8 | 90.4 | 85.3 | 54.0 | 79.5 | 84.9 | 95.3 | 84.4 | 73.6 | 84.6 | 82.9 | 69.2 | 49.8 | 78.5 | 75.7 | 47.1 | 31.0 | 44.0 | 59.8 | 72.3 | 0.29 |
| AdaptFormer [2] | 70.8 | 91.2 | 70.5 | 99.1 | 90.9 | 86.6 | 54.8 | 80.6 | 83.0 | 95.8 | 84.4 | 76.3 | 84.9 | 81.9 | 64.3 | 49.3 | 80.3 | 76.3 | 45.7 | 31.7 | 41.1 | 58.8 | 72.3 | 0.16 |
| FacT-TK≤32[26] | 70.6 | 90.6 | 70.8 | 99.1 | 90.7 | 88.6 | 54.1 | 80.6 | 84.8 | 96.2 | 84.5 | 75.7 | 85.3 | 82.6 | 68.2 | 49.8 | 80.7 | 80.8 | 47.4 | 33.2 | 43.0 | 60.7 | 73.2 | 0.07 |
| ARC [3] | 72.2 | 90.1 | 72.7 | 99.0 | 91.0 | 91.9 | 54.4 | 81.6 | 84.9 | 95.7 | 86.7 | 75.8 | 85.8 | 80.7 | 67.1 | 48.7 | 81.6 | 79.2 | 51.0 | 31.4 | 39.9 | 60.0 | 73.4 | 0.13 |
| RLRR [29] | 75.6 | 92.4 | 72.9 | 99.3 | 91.5 | 89.8 | 57.0 | 82.7 | 86.8 | 95.2 | 85.3 | 75.9 | 85.8 | 79.7 | 64.2 | 53.9 | 82.1 | 83.9 | 53.7 | 33.4 | 43.6 | 61.8 | 74.5 | 0.33 |
| HTA | 76.6 | 94.3 | 72.5 | 99.3 | 91.3 | 86.2 | 56.5 | 82.4 | 87.6 | 95.7 | 85.0 | 75.7 | 86.0 | 82.6 | 63.3 | 52.5 | 81.0 | 84.5 | 52.6 | 34.5 | 47.3 | 62.3 | 74.7 | 0.22 |
| SSF [25] | 69.0 | 92.6 | 75.1 | 99.4 | 91.8 | 90.2 | 52.9 | 81.6 | 87.4 | 95.9 | 87.4 | 75.5 | 86.6 | 75.9 | 62.3 | 53.3 | 80.6 | 77.3 | 54.9 | 29.5 | 37.9 | 59.0 | 73.1 | 0.24 |
| ARC* [3] | 71.2 | 90.9 | 75.9 | 99.5 | 92.1 | 90.8 | 52.0 | 81.8 | 87.4 | 96.5 | 87.6 | 76.4 | 87.0 | 83.3 | 61.1 | 54.6 | 81.7 | 81.0 | 57.0 | 30.9 | 41.3 | 61.4 | 74.3 | 0.13 |
| RLRR* [29] | 76.7 | 92.7 | 76.3 | 99.6 | 92.6 | 91.8 | 56.0 | 83.7 | 87.8 | 96.2 | 89.1 | 76.3 | 87.3 | 80.4 | 63.3 | 54.5 | 83.3 | 83.0 | 53.7 | 32.0 | 41.7 | 61.5 | 75.1 | 0.33 |
| HTA* | 79.0 | 92.8 | 77.6 | 99.6 | 92.4 | 89.4 | 55.1 | 83.7 | 88.2 | 96.1 | 89.7 | 76.4 | 87.6 | 84.2 | 61.7 | 53.6 | 82.0 | 85.1 | 53.7 | 33.9 | 47.9 | 62.8 | 75.7 | 0.22 |