notesum.ai

Published at November 26

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

cs.MM
cs.CV
cs.SD
eess.AS

Released Date: November 26, 2024

Authors: Akshita Gupta1, Tatiana Likhomanenko2, Karren Dai Yang, Richard He Bai, Zakaria Aldeneh2, Navdeep Jaitly2

Aff.: 1University of Guelph; 2Apple

Arxiv: http://arxiv.org/abs/2411.17690v1