notesum.ai
Published at November 25SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
cs.SD
cs.AI
eess.AS
Released Date: November 25, 2024
Authors: Youngjun Sim1, Jinsung Yoon1, Young-Joo Suh1
Aff.: 1Graduate School of Artificial Intelligence, POSTECH, Pohang, South Korea

| seen-to-seen | unseen-to-unseen | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | MOS | SMOS | WER(%) | CER(%) | EER(%) | MOS | SMOS | WER(%) | CER(%) | EER(%) |
| VQMIVC [9] | 3.170.17 | 2.660.16 | 49.85 | 29.90 | 30.1 | 1.740.13 | 1.580.12 | 62.71 | 39.13 | 39.0 |
| YourTTS [11] | 3.030.19 | 3.410.15 | 31.22 | 16.27 | 5.0 | 1.980.13 | 2.090.15 | 41.04 | 23.65 | 15.9 |
| FreeVC [12] | 3.880.14 | 4.100.13 | 10.36 | 4.02 | 3.4 | 3.790.13 | 2.560.16 | 10.64 | 5.70 | 16.4 |
| SKQVC | 3.910.13 | 4.280.12 | 8.42 | 3.32 | 3.1 | 3.840.14 | 3.950.16 | 10.05 | 5.37 | 10.8 |
| GT | 4.240.15 | - | 5.53 | 1.79 | - | 4.310.13 | - | 8.09 | 4.31 | - |