notesum.ai

Published at December 10

DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

cs.CL

Released Date: December 10, 2024

Authors: Ellen Yi-Ge¹, Jiechao Gao², Wei Han³, Wei Zhu⁴

Aff.: ¹Carnegie Mellon University, PA, United States; ²University of Virginia, VA, United States; ³Independent Researcher, TX, United States; ⁴University of Hong Kong, HK, China

Arxiv: http://arxiv.org/pdf/2412.07619v1

Retrieval	VQA			ImageCLS		ImageCap
Methods	VQAv2	VizWiz	OK-VQA	Flowers102	Hateful-Memes	Flicker30K	NoCaps
Null	36.7	14.3	11.8	13.4	44.6	17.3	19.6
Random	46.2	33.6	26.3	31.3	51.3	27.5	29.8
Fixed	46.3	32.4	27.8	32.0	51.1	28.7	29.6
BM25	48.4	24.8	25.2	25.6	46.5	23.8	24.6
Dino	49.3	36.8	29.9	35.7	53.2	29.0	28.8
BGE	48.7	27.9	31.6	26.3	46.7	23.7	24.8
CLIP	49.9	48.2	33.4	36.5	55.4	29.2	30.7
EPR	50.6	51.2	34.7	38.7	56.7	30.1	31.6
Dr-VL	52.4	54.6	36.8	40.2	59.6	31.7	33.9