notesum.ai

Published at November 27

Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models

cs.CV

Released Date: November 27, 2024

Authors: Jingming Liu1, Yumeng Li1, Boyuan Xiao1, Yichang Jian1, Ziang Qin1, Tianjia Shao1, Yao-Xiang Ding1, Kun Zhou1

Aff.: 1State Key Laboratory of CAD&CG, Zhejiang University

Arxiv: http://arxiv.org/abs/2411.18142v1