notesum.ai
Published at October 18RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
cs.LG
cs.AI
Released Date: October 18, 2024
Authors: Muhe Ding, Yang Ma, Pengda Qin, Jianlong Wu, Yuhong Li, Liqiang Nie

| Model | Retr-F1 | QA-FL | QA-Acc | QA |
|---|---|---|---|---|
| VLP [52] | 0.69 | 42.6 | 36.7 | 22.6 |
| VLP+VinVL [41] | 0.71 | 44.2 | 38.9 | 24.1 |
| MuRAG [14] | 0.75 | 55.7 | 54.6 | 36.1 |
| SKURG [20] | 0.88 | 55.4 | 57.1 | 37.7 |
| Solar [21] | 0.89 | 60.9 | 58.9 | 40.9 |
| InstructBLIP [30] | - | 51.7 | 59.0 | 31.4 |
| InstructBLIP [30] | - | 53.4 | 62.5 | 35.0 |
| RA-BLIP (T5-base) | - | 62.6 | 59.7 | 41.6 |
| RA-BLIP (T5-large) | - | 62.9 | 60.9 | 42.5 |
| RA-BLIP | 0.83 | 65.1 | 65.3 | 45.8 |
| RA-BLIP | 0.89 | 65.5 | 68.7 | 48.5 |