notesum.ai

Published at December 4

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

cs.CV
cs.AI
cs.LG

Released Date: December 4, 2024

Authors: Mahtab Bigverdi1, Zelun Luo2, Cheng-Yu Hsieh1, Ethan Shen1, Dongping Chen1, Linda G. Shapiro1, Ranjay Krishna1

Aff.: 1University of Washington; 2Google Research

Arxiv: http://arxiv.org/pdf/2412.03548v1