notesum.ai
Published at December 10DRUM: Learning Demonstration Retriever for Large MUlti-modal Models
cs.CL
Released Date: December 10, 2024
Authors: Ellen Yi-Ge1, Jiechao Gao2, Wei Han3, Wei Zhu4
Aff.: 1Carnegie Mellon University, PA, United States; 2University of Virginia, VA, United States; 3Independent Researcher, TX, United States; 4University of Hong Kong, HK, China

| Retrieval | VQA | ImageCLS | ImageCap | ||||
| Methods | VQAv2 | VizWiz | OK-VQA | Flowers102 | Hateful-Memes | Flicker30K | NoCaps |
| Null | 36.7 | 14.3 | 11.8 | 13.4 | 44.6 | 17.3 | 19.6 |
| Random | 46.2 | 33.6 | 26.3 | 31.3 | 51.3 | 27.5 | 29.8 |
| Fixed | 46.3 | 32.4 | 27.8 | 32.0 | 51.1 | 28.7 | 29.6 |
| BM25 | 48.4 | 24.8 | 25.2 | 25.6 | 46.5 | 23.8 | 24.6 |
| Dino | 49.3 | 36.8 | 29.9 | 35.7 | 53.2 | 29.0 | 28.8 |
| BGE | 48.7 | 27.9 | 31.6 | 26.3 | 46.7 | 23.7 | 24.8 |
| CLIP | 49.9 | 48.2 | 33.4 | 36.5 | 55.4 | 29.2 | 30.7 |
| EPR | 50.6 | 51.2 | 34.7 | 38.7 | 56.7 | 30.1 | 31.6 |
| Dr-VL | 52.4 | 54.6 | 36.8 | 40.2 | 59.6 | 31.7 | 33.9 |