notesum.ai
Published at November 27Cross-modal Information Flow in Multimodal Large Language Models
cs.AI
cs.CL
cs.CV
Released Date: November 27, 2024
Authors: Zhi Zhang1, Srishti Yadav2, Fengze Han3, Ekaterina Shutova1
Aff.: 1ILLC, University of Amsterdam, Netherlands; 2Dept. of Computer Science, University of Copenhagen, Denmark; 3Dept. of Computer Engineering, Technical University of Munich, Germany

| Name |
|
|
|
|
Question Example | Answer | Num. | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ChooseAttr | Choose | Attribute | Open | ![]() |
What was used to make the door, wood or metal? | Wood | 1000 | ||||||||
| ChooseCat | Choose | Category | Open | Which piece of furniture is striated, bed or door? | Bed | 1000 | |||||||||
| ChooseRel | Choose | Relation | Open | Is the door to the right or to the left of the bed? | Right | 964 | |||||||||
| CompareAttr | Compare | Attribute | Open | ![]() |
What is common to the bike and the dog? | Color | 570 | ||||||||
| LogicalObj | Logical | Object | Binary | Are there either women or men that are running? | No | 991 | |||||||||
| QueryAttr | Query | Attribute | Open | In which part of the image is the dog? | Left | 1000 |
![[Uncaptioned image]](https://arxiv.org/html/2411.18620v1/extracted/6027619/images/dataset/original_image_174.jpg)
![[Uncaptioned image]](https://arxiv.org/html/2411.18620v1/extracted/6027619/images/dataset/original_image_2403341.jpg)