notesum.ai

Published at May 11

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

NeurIPS

Released Date: May 11, 2024

Authors: Haogeng Liu1, Quanzeng You2, Xiaotian Han2, Yongfei Liu2, Huaibo Huang1, Ran He1, Hongxia Yang1

Aff.: 1MAIS & NLPR, Institute of Automation, Chinese Academy of Sciences; 2ByteDance, Inc

Arxiv: https://openreview.net/pdf/4041aad3b42874372a267d6990a28307a3c622bb.pdf