notesum.ai
Published at December 6$S^3$: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models
cs.CV
Released Date: December 6, 2024
Authors: Xiaojie Yin1, Qilong Wang1, Bing Cao1, Qinghua Hu1
Aff.: 1Tianjin University

| Method | Flowers | DTD | Pets | Cars | UCF | CalTech | Food | SUN | Aircraft | EuroSAT | Avg. | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | CLIP-ViT-B/16 | 67.28 | 44.44 | 87.98 | 65.24 | 65.08 | 92.98 | 83.80 | 62.55 | 23.70 | 41.41 | 63.45 |
| Prompt Engineering | DCLIP† [23] | 70.52 | 49.82 | 87.30 | 66.70 | 70.34 | 93.96 | 84.50 | 67.47 | 24.81 | 44.37 | 65.98 |
| CuPL† [30] | 73.57 | 49.17 | 91.25 | 66.10 | 70.31 | 93.96 | 84.44 | 67.66 | 27.84 | 50.70 | 67.50 | |
| REAL† [28] | 73.20 | 51.12 | 91.41 | 66.45 | 65.40 | 90.22 | 83.71 | 62.61 | 24.69 | 54.44 | 66.33 | |
| MPVR [25] | 76.90 | 56.10 | 89.90 | 65.40 | 70.90 | 94.10 | 86.40 | 68.80 | 28.00 | 59.60 | 69.61 | |
| (Ours) | 81.36 | 53.96 | 91.58 | 66.45 | 70.39 | 93.59 | 84.02 | 67.77 | 29.73 | 61.51 | 70.04 | |
| Test-Time Adaptation | TPT [38] | 68.98 | 47.75 | 87.79 | 66.87 | 68.04 | 94.16 | 84.67 | 65.50 | 24.78 | 42.44 | 65.10 |
| DiffTPT [11] | 70.10 | 47.00 | 88.22 | 67.01 | 68.22 | 92.49 | 87.23 | 65.74 | 25.60 | 43.13 | 65.47 | |
| MTA [46] | 68.06 | 45.90 | 88.24 | 68.47 | 66.69 | 94.21 | 85.00 | 66.67 | 25.20 | 45.36 | 65.58 | |
| TPS [40] | 71.54 | 50.47 | 87.35 | 69.06 | 71.00 | 95.09 | 85.23 | 68.98 | 26.34 | 44.48 | 66.95 | |
| OnZeta [31] | 69.63 | 48.58 | 89.32 | 69.03 | 69.94 | 93.89 | 86.35 | 69.01 | 28.29 | 56.74 | 68.08 | |
| (Ours) | 81.65 | 54.08 | 92.04 | 67.17 | 71.24 | 93.71 | 84.23 | 68.06 | 30.30 | 60.72 | 70.32 |