notesum.ai

Published at November 21

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

cs.CV

Released Date: November 21, 2024

Authors: Yuhao Dong1, Zuyan Liu2, Hai-Long Sun2, Jingkang Yang1, Winston Hu2, Yongming Rao3, Ziwei Liu1

Aff.: 1S-Lab, NTU; 2Tencent; 3Tencent, Tsinghua University

Arxiv: http://arxiv.org/abs/2411.14432v1