notesum.ai

Published at December 9

iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models

cs.CV

Released Date: December 9, 2024

Authors: Lianyu Hu1, Fanhua Shang, Liang Wan, Wei Feng

Aff.: 1College of Intelligence and Computing, Tianjin University, China

Arxiv: http://arxiv.org/pdf/2412.06263v1