notesum.ai
Published at November 12BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions
cs.CV
cs.AI
Released Date: November 12, 2024
Authors: Anas Awadalla1, Le Xue2, Manli Shu2, An Yan2, Jun Wang2, Senthil Purushwalkam2, Sheng Shen3, Hannah Lee1, Oscar Lo1, Jae Sung Park1, Etash Guha1, Silvio Savarese4, Ludwig Schmidt1, Yejin Choi1, Caiming Xiong2, Ran Xu2
Aff.: 1University of Washington; 2Salesforce Research; 3University of California, Berkeley; 4Stanford University
![[Uncaptioned image]](https://arxiv.org/html/2411.07461v1/extracted/5989662/figs/kale.png)
| Model | Benchmarks | ||||||||||
| TextVQA | VQAv2 | ScienceQA | AI2D | MMBench | ChartQA | InfoVQA | OCRBench | RealWorldQA | MMStar | Avg | |
|
|
59.92 | 70.10 | 72.68 | 65.64 | 58.59 | 23.28 | 29.28 | 43.80 | 52.42 | 43.91 | 51.96 |
| CogVLM (Ours) | 59.74 | 69.42 | 70.30 | 65.35 | 61.60 | 23.64 | 29.53 | 43.80 | 52.03 | 41.90 | 51.73 |
| CapsFusion | 57.62 | 67.30 | 71.79 | 62.27 | 60.82 | 22.28 | 27.67 | 43.10 | 52.03 | 43.91 | 50.88 |
| Recap-Datacomp | 58.49 | 67.36 | 71.19 | 63.31 | 52.75 | 23.08 | 28.45 | 42.20 | 53.07 | 41.90 | 50.18 |
| Datacomp | 57.40 | 67.22 | 69.51 | 61.82 | 59.45 | 22.28 | 28.53 | 42.20 | 50.46 | 39.70 | 49.86 |
| LAION-COCO | 54.12 | 65.26 | 65.94 | 59.10 | 55.58 | 21.60 | 26.81 | 38.90 | 44.05 | 38.90 | 47.03 |