notesum.ai
Published at October 21LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
cs.AI
Released Date: October 21, 2024
Authors: Yiwei Guo1, Zhihan Li1, Chenpeng Du1, Hankun Wang1, Xie Chen1, Kai Yu1
Aff.: 1MoE Key Lab of Artificial Intelligence, AI Institute X-LANCE Lab, Shanghai Jiao Tong University, Shanghai, China

| Acoustic Speech Codecs | Bitrate (kbps) | Speaker Decouple | Supervised Data | |||
| EnCodec (8VQ)Ā [5] | 8 | 75Hz | 1024 | 6.00 | No | No |
| DACĀ [6] | 9 | 86Hz | 1024 | 8.00 | No | No |
| TiCodecĀ [17] | 1-4 | 75Hz | 1024 | 0.75-3.00 | Implicit | No |
| FACodecĀ [21] | 6 | 80Hz | 1024 | 4.80 | Explicit | Yes |
| SSVCĀ [20] | 4 | 50Hz | 512 | 1.80 | Explicit | No |
| SpeechTokenizerĀ [19] | 8 | 50Hz | 1024 | 4.00 | Implicit | No |
| Single-CodecĀ [18] | 1 | 23Hz | 8192 | 0.30 | Implicit | No |
| WavTokenizer-40HzĀ [25] | 1 | 40Hz | 4096 | 0.48 | No | No |
| WavTokenizer-75Hz | 1 | 75Hz | 4096 | 0.90 | No | No |
| LSCodec-50Hz (Ours) | 1 | 50Hz | 300 | 0.45 | Explicit | No |
| LSCodec-25Hz (Ours) | 1 | 25Hz | 1024 | 0.25 | Explicit | No |