notesum.ai

Published at November 5

Learning to Unify Audio, Visual and Text for Audio-Enhanced Multilingual Visual Answer Localization

cs.MM
cs.AI
cs.CL
cs.HC
cs.IR

Released Date: November 5, 2024

Authors: Zhibin Wen1, Bin Li

Aff.: 1Systems Engineering Institute, Xi'an Jiaotong University

Arxiv: http://arxiv.org/abs/2411.02851v1