notesum.ai

Published at October 30

Aligning Audio-Visual Joint Representations with an Agentic Workflow

cs.CV
cs.AI
cs.LG
cs.MM
cs.SD
eess.AS

Released Date: October 30, 2024

Authors: Shentong Mo1, Yibing Song2

Aff.: 1Carnegie Mellon University MBZUAI; 2Alibaba Group Hupan Lab

Arxiv: http://arxiv.org/abs/2410.23230v1