notesum.ai
Published at December 5Grounding Descriptions in Images informs Zero-Shot Visual Recognition
cs.CV
cs.LG
Released Date: December 5, 2024
Authors: Shaunak Halbe1, Junjiao Tian, K J Joseph, James Seale Smith, Katherine Stevo, Vineeth N Balasubramanian, Zsolt Kira
Aff.: 1Georgia Institute of Technology

| Data | Model | MS-COCO | Flickr30k | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Image-to-Text | Text-to-Image | Image-to-Text | Text-to-Image | ||||||||||
| R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | ||
| CC3M | CLIP | 15.79 | 38.26 | 50.70 | 13.58 | 33.76 | 46.04 | 27.00 | 53.80 | 66.30 | 21.78 | 44.26 | 55.10 |
| GRAIN | 38.26 | 65.96 | 77.03 | 28.81 | 55.86 | 69.00 | 59.90 | 81.80 | 88.40 | 42.82 | 68.21 | 76.54 | |
| +22.47 | +27.70 | +26.33 | +15.23 | +22.10 | +22.96 | +32.90 | +28.00 | +22.10 | +21.04 | +23.95 | +21.44 | ||
| CC12M | CLIP | 41.32 | 69.40 | 80.04 | 30.02 | 57.32 | 69.65 | 59.60 | 84.70 | 89.90 | 43.63 | 68.75 | 76.77 |
| GRAIN | 58.30 | 83.07 | 89.67 | 42.66 | 70.77 | 80.83 | 78.00 | 94.60 | 97.80 | 59.36 | 80.01 | 85.59 | |
| +16.98 | +13.67 | +9.63 | +12.64 | +13.45 | +11.18 | +18.40 | +9.90 | +7.90 | +15.73 | +11.26 | +8.82 | ||