notesum.ai

Published at December 5

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

cs.CV
cs.AI

Released Date: December 5, 2024

Authors: Jiuhai Chen1, Jianwei Yang2, Haiping Wu2, Dianqi Li2, Jianfeng Gao2, Tianyi Zhou1, Bin Xiao2

Aff.: 1University of Maryland; 2Microsoft Research

Arxiv: http://arxiv.org/pdf/2412.04424v1