notesum.ai
Published at November 25CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
cs.CV
Released Date: November 25, 2024
Authors: Yuan Zhou1, Qingshan Xu1, Jiequan Cui1, Junbao Zhou1, Jing Zhang2, Richang Hong3, Hanwang Zhang1
Aff.: 1Nanyang Technological University; 2Beihang University; 3Hefei University of Technology

| OB | IS | SS | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Backbone | Ref. | Type | GMACs | iPhone13 () | RTX 4090 () | mIoU | ||||||
| EfficientFormer-L1 [26] | NeurIPS’ 22 | CONV | ||||||||||
| MobileViG-M [32] | CVPR’ 23 | GNN+CONV | n/a | |||||||||
| MobileViGv2-M [1] | CVPR’ 24 | GNN+CONV | ||||||||||
| ResNet18 [20] | CVPR’ 16 | CONV | ||||||||||
| FastViT-SA12 [42] | CVPR’ 23 | SA+CONV | ||||||||||
| PoolFormer-S12 [55] | CVPR’ 22 | CONV | ||||||||||
| SLAB-PVT-T [14] | ICLR’ 24 | LA | n/a | |||||||||
| FLatten-PVT-T [15] | CVPR’ 23 | LA | ||||||||||
| Agent-PVT-T [17] | ECCV’ 24 | SA | ||||||||||
| PoolFormer-S24 [55] | CVPR’ 22 | CONV | ||||||||||
| EfficientFormer-L3 [26] | NeurIPS’ 22 | CONV | ||||||||||
| ResNet50 [20] | CVPR’ 16 | CONV | ||||||||||
| MLLA-T [16] | NeurIPS’ 24 | LA+CONV | n/a | |||||||||
| Swin-T [28] | CVPR’ 22 | SA | ||||||||||
| FLatten-Swin-T [15] | CVPR’ 23 | LA | ||||||||||
| Agent-Swin-T [17] | ECCV’ 24 | SA | ||||||||||
| SLAB-Swin-T [14] | ICLR’ 24 | LA | n/a | |||||||||
| ResNet101 [20] | CVPR’ 16 | CONV | ||||||||||
| \hdashline Our CARE-S0 | - | LA+CONV | ||||||||||
| Our CARE-S1 | - | LA+CONV | ||||||||||
| Our CARE-S2 | - | LA+CONV | ||||||||||