notesum.ai
Published at December 9The Narrow Gate: Localized Image-Text Communication in Vision-Language Models
cs.CV
cs.LG
Released Date: December 9, 2024
Authors: Alessandro Serra, Francesco Ortu1, Emanuele Panizon2, Lucrezia Valeriani1, Lorenzo Basile1, Alessio Ansuini2, Diego Doimo2, Alberto Cazzaniga2
Aff.: 1University of Trieste, Trieste, Italy; 2AREA Science Park, Trieste, Italy

| Model | Ablation | VQAv2 | Flickr | MS-COCO | ImageNet |
|---|---|---|---|---|---|
| Chameleon-7B | - | ||||
| [EOI] | |||||
| random | |||||
| Chameleon-34B | - | ||||
| [EOI] | |||||
| random | |||||
| Pixtral | - | ||||
| [EOI] | |||||
| random |