notesum.ai
Published at November 22Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers
cs.CV
Released Date: November 22, 2024

| Name | Dataset | Seen | Params (M) | MSCOCO | Flickr30k | |||
|---|---|---|---|---|---|---|---|---|
| Samples(B) | Img | Txt | I2T | T2I | I2T | T2I | ||
| DataComp-B/16[7] | DataComp-1B | 13 | 86.2 | 63.4 | 59.4 | 42.3 | 86.3 | 69.8 |
| DataComp-B/32[7] | 86.2 | 63.4 | 53.5 | 37.1 | 79.0 | 61.1 | ||
| MobileCLIP-S0[33] | DataCompDR-1B | 13 | 11.4 | 42.4 | 58.7 | 40.4 | 85.9 | 67.7 |
| TinyCLIP-63M/32[36] | LAION-400M | 15.8 | 86.2 | 63.4 | 55.5 | 37.6 | 83.2 | 64.4 |
| LAION-B/32[30] | (-) | 86.2 | 63.4 | 53.3 | 35.4 | 79.3 | 62.0 | |
| OpenAI-B/16[28] | 86.2 | 63.4 | 58.7 | 40.4 | 85.9 | 67.7 | ||
| OpenAI-B/32[28] | WIT-400M | 13 | 86.2 | 63.4 | 50.1 | 30.4 | 78.9 | 58.8 |
| OpenAI-RN50[28] | 38.3 | 63.4 | 48.8 | 28.5 | 80.0 | 57.4 | ||
| RILS-B/16[41] | LAION-20M | 0.5 | 86.2 | 37.8 | 32.2 | 25.5 | 45.1 | 34.9 |
| MaskCLIP-B/16[46] | 86.2 | 37.8 | 38.5 | 24.8 | 64.9 | 48.1 | ||
| SLIP-B/16[23] | 86.2 | 37.8 | 31.1 | 20.3 | 57.6 | 40.1 | ||
| TinyCLIP-B/16[36] | YFCC-15M | 0.38 | 86.2 | 37.8 | 26.5 | 17.1 | 51.6 | 32.2 |
| TinyCLIP-39M/16[36] | 86.2 | 37.8 | 54.9 | 38.9 | 84.4 | 66.7 | ||
| CLIPKD-B/16[38] | 86.2 | 37.8 | 25.0 | 24.7 | 54.6 | 56.6 | ||
| CLIPKD-RN101[38] | 56.3 | 37.8 | 25.2 | 25.7 | 57.0 | 55.5 | ||
| CLIPKD-Swin/T[38] | CC12M+CC3M | 0.48 | 27.9 | 21.3 | 28.5 | 28.6 | 62.2 | 60.9 |
| CLIPKD-MobileNetV3[38] | 2.0 | 21.3 | 17.9 | 16.0 | 42.4 | 42.3 | ||
| CLIPKD-RN18[38] | 11.4 | 21.3 | 21.3 | 19.8 | 47.8 | 47.1 | ||
| SiCLIP(ours) | CC12M-SYN | 0.38 | 9.78 | 42.4 | 55.7 | 39.7 | 82.0 | 66.6 |