notesum.ai

Published at December 9

Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment

cs.CV
cs.MM
cs.SD
eess.AS

Released Date: December 9, 2024

Authors: Kim Sung-Bin1, Arda Senocak2, Hyunwoo Ha1, Tae-Hyun Oh3

Aff.: 1POSTECH, Pohang, Republic of Korea; 2KAIST, Daejeon, Republic of Korea; 3POSTECH, Pohang, Republic of Korea; Yonsei University, Seoul, Republic of Korea

Arxiv: http://arxiv.org/pdf/2412.06209v1