notesum.ai
Published at December 9Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey
cs.CL
cs.AI
cs.LG
cs.MM
cs.SD
eess.AS
Released Date: December 9, 2024
Authors: Tianxin Xie1, Yan Rong1, Pengfei Zhang1, Li Liu1
Aff.: 1Hong Kong University of Science and Technology (Guangzhou)

| Method | Modeling | Code | Year |
| VQ-Wav2Vec [164] | SSCP | https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#vq-wav2vec | 2019 |
| Wav2Vec 2.0 [165] | SSCP | https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec | 2019 |
| HuBERT [166] | SSCP | https://github.com/facebookresearch/fairseq/tree/main/examples/hubert | 2021 |
| W2v-BERT 2.0 [167] | SSCP | https://huggingface.co/facebook/w2v-bert-2.0 | 2023 |
| SoundStream [168] | RVQGAN | https://github.com/wesbz/SoundStream | 2021 |
| Encodec [169] | RVQGAN | https://github.com/facebookresearch/encodec | 2022 |
| HiFi-Codec [170] | RVQGAN | https://github.com/yangdongchao/AcademiCodec | 2023 |
| SpeechTokenizer [171] | RVQGAN | https://github.com/ZhangXInFD/SpeechTokenizer | 2023 |
| Descript Audio Codec [172] | RVQGAN | https://github.com/descriptinc/descript-audio-codec | 2023 |