notesum.ai

Published at December 3

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

cs.CV
cs.AI
cs.CL
cs.MM
cs.SD
eess.AS

Released Date: December 3, 2024

Authors: Kaixiong Gong1, Kaituo Feng1, Bohao Li2, Yibing Wang1, Mofan Cheng1, Shijia Yang3, Jiaming Han1, Benyou Wang2, Yutong Bai4, Zhuoran Yang5, Xiangyu Yue1

Aff.: 1CUHK MMLab; 2CUHK (SZ); 3Stanford University; 4UC Berkeley; 5Yale University

Arxiv: http://arxiv.org/pdf/2412.02611v1