notesum.ai

Published at November 21

Multimodal Autoregressive Pre-training of Large Vision Encoders

cs.CV

Released Date: November 21, 2024

Authors: Enrico Fini1, Mustafa Shukor1, Xiujun Li1, Philipp Dufter1, Michal Klein1, David Haldimann1, Sai Aitharaju1, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan1, Alexander T Toshev, Marcin Eichner1, Moin Nabi1, Yinfei Yang1, Joshua M. Susskind, Alaaeldin El-Nouby1

Aff.: 1Apple

Arxiv: http://arxiv.org/abs/2411.14402v1