notesum.ai
Published at November 28CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
cs.CV
cs.CL
cs.LG
Released Date: November 28, 2024
Authors: Mohamed Fazli Imam1, Rufael Fedaku Marew1, Jameel Hassan2, Mustansar Fiaz3, Alham Fikri Aji1, Hisham Cholakkal1
Aff.: 1Mohamed Bin Zayed University of AI; 2Mohamed Bin Zayed University of AI, The Johns Hopkins University; 3IBM Research

| Venue |
ImageNet |
EuroSAT |
Caltech101 |
OxfordPets |
UCF101 |
DTD |
Flowers102 |
SUN397 |
RESISC45 |
CIFAR10 |
CIFAR100 |
Average |
|
| Few-Shot Methods | |||||||||||||
| CoOp (1-Shot) (Zhou et al. 2022b) | IJCV (‘22) | 60.6 | 58.4 | 91.7 | - | 63.8 | 40.1 | 71.2 | 64.1 | - | 83 | 55.6 | - |
| CoOp (5-Shot) (Zhou et al. 2022b) | IJCV (‘22) | 61.3 | 71.8 | 93.2 | - | 74.3 | 41.1 | 85.8 | 67.3 | - | 86.6 | 63.2 | - |
| CoOp (10-Shot) (Zhou et al. 2022b) | IJCV (‘22) | 62.3 | 81.6 | 94.6 | - | 77.2 | 65.8 | 92.1 | 69 | - | 88.5 | 66.6 | - |
| Label-Free Methods | |||||||||||||
| CLIP (Radford et al. 2021) | ICML (‘21) | 61.9 | 40.6 | 90.5 | 85.0 | 61.0 | 42.9 | 66.6 | 60.8 | 49.8 | 88.8 | 64.2 | 64.7 |
| CuPL (Pratt et al. 2023) | ICCV (‘23) | 63.4 | 62.2 | 90.6 | 87.2 | 63.9 | 48.0 | 71.5 | 66.0 | 61.9 | 89.2 | 65.8 | 70.0 |
| MetaPrompt (Mirza et al. 2024a) | ECCV (‘24) | 65.0 | 55.6 | 92.9 | 88.1 | 67.9 | 50.8 | 73.9 | 67.0 | 64.0 | 89.9 | 66.3 | 71.0 |
| LaFTer (Mirza et al. 2024b) | NeurIPS (‘24) | 64.2 | 73.9 | 93.3 | 82.7 | 68.2 | 46.1 | 71.0 | 64.5 | 68.3 | 95.8 | 74.6 | 73.0 |
| ProText (Khattak et al. 2024) | Arxiv | 64.9 | 51.4 | 93.4 | 89.0 | 66.4 | 50.7 | 74.2 | 66.8 | 57.4 | 89.5 | 66.1 | 70.0 |
| WaffleCLIP (Roth et al. 2023) | ICCV (‘23) | 63.5 | 46.7 | 94.8 | 88.1 | 65.8 | 51.0 | 68.7 | 65.6 | 63.4 | 90.9 | 67.2 | 69.6 |
| Ours | 65.4 | 73.5 | 94.8 | 89.3 | 68.3 | 56.1 | 82.7 | 67.0 | 75.4 | 94.9 | 75.6 | 76.6 | |