notesum.ai
Published at November 4INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
cs.CV
cs.AI
cs.CL
cs.IR
Released Date: November 4, 2024
Authors: Edward Vendrow1, Omiros Pantazis2, Alexander Shepard3, Gabriel Brostow2, Kate E. Jones2, Oisin Mac Aodha4, Sara Beery1, Grant Van Horn5
Aff.: 1Massachusetts Institute of Technology; 2University College London; 3iNaturalist; 4University of Edinburgh; 5University of Massachusetts Amherst

| Method | AP | nDCG | MRR |
|---|---|---|---|
| Random | 22.1 | 52.6 | 0.35 |
| Embedding models | |||
| CLIP ViT-B-32 Radford et al. [2021] | 30.2 | 59.1 | 0.47 |
| CLIP ViT-L-14 Radford et al. [2021] | 36.8 | 64.2 | 0.57 |
| CLIP ViT-H-14 Fang et al. [2024] | 42.6 | 68.7 | 0.66 |
| SigLIP SO400m-14 Zhai et al. [2023] | 50.1 | 73.5 | 0.72 |
| Proprietary multimodal models | |||
| GPT-4V Achiam et al. [2023] | 47.8 | 71.9 | 0.70 |
| GPT-4o OpenAI [2024] | 59.6 | 78.9 | 0.78 |