notesum.ai

Published at November 5

Inference Optimal VLMs Need Only One Visual Token but Larger Models

cs.CV
cs.AI
cs.LG

Released Date: November 5, 2024

Authors: Kevin Y. Li1, Sachin Goyal1, Joao D. Semedo2, J. Zico Kolter1

Aff.: 1Carnegie Mellon University; 2Bosch Center for Artificial Intelligence

Arxiv: http://arxiv.org/abs/2411.03312v1