notesum.ai

Published at November 29

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

cs.CV
cs.CL
cs.LG
cs.MM

Released Date: November 29, 2024

Authors: Tiantian Geng1, Jinrui Zhang2, Qingni Wang3, Teng Wang2, Jinming Duan1, Feng Zheng2

Aff.: 1University of Birmingham; 2Southern University of Science and Technology; 3University of Electronic Science and Technology of China

Arxiv: http://arxiv.org/pdf/2411.19772v1