notesum.ai

Published at November 8

Tell What You Hear From What You See -- Video to Audio Generation Through Text

cs.CV
cs.AI
cs.LG
cs.SD
eess.AS

Released Date: November 8, 2024

Authors: Xiulong Liu, Kun Su, Eli Shlizerman

Arxiv: http://arxiv.org/abs/2411.05679v1