notesum.ai

Published at November 29

Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

cs.CV
cs.CL
cs.LG
cs.MM

Released Date: November 29, 2024

Authors: Qiong Wu1, Wenhao Lin1, Weihao Ye1, Yiyi Zhou1, Xiaoshuai Sun1, Rongrong Ji1

Aff.: 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China

Arxiv: http://arxiv.org/pdf/2411.19628v1