notesum.ai

Published at November 9

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

cs.AI

Released Date: November 9, 2024

Authors: Chia Xin Liang1, Pu Tian2, Caitlyn Heqi Yin3, Yao Yua4, Wei An-Hou5, Li Ming6, Tianyang Wang7, Ziqian Bi8, Ming Liu8

Aff.: 1JTB Technology Corp.; 2Stockton University; 3University of Wisconsin-Madison; 4AppCubic USA; 5Nomad Sustaintech LTD; 6Georgia Institute of Technology; 7University of Liverpool; 8Purdue University

Arxiv: http://arxiv.org/abs/2411.06284v1