notesum.ai
Published at November 4CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching
cs.SD
cs.AI
eess.AS
Released Date: November 4, 2024
Authors: Yu Pan1, Yuguang Yang2, Jixun Yao2, Jianhao Ye2, Hongbin Zhou2, Lei Ma3, Jianjun Zhao1
Aff.: 1Kyushu University, Japan; 2Ximalaya Inc., China; 3University of Tokyo, Japan
| NMOS () | SMOS () | WER () | UTMOS () | SECS () | |
|---|---|---|---|---|---|
| GT | 4.18 | - | 2.01 | 4.19 | - |
| DiffVC | 3.75 | 3.66 | 3.08 | 3.68 | 0.61 |
| NS2VC | 3.65 | 3.51 | 2.94 | 3.64 | 0.53 |
| VALLE-VC | 3.80 | 3.79 | 2.77 | 3.72 | 0.65 |
| SEFVC | 3.68 | 3.76 | 3.75 | 3.51 | 0.63 |
| CTEFM-VC | 3.91 | 4.15 | 2.41 | 3.98 | 0.77 |