notesum.ai
Published at December 6StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
eess.AS
cs.SD
Released Date: December 6, 2024
Authors: Jixun Yao1, Yuguang Yan2, Yu Pan2, Ziqian Ning1, Jiaohao Ye2, Hongbin Zhou2, Lei Xie1
Aff.: 1Northwestern Polytechnical University; 2Ximalaya Inc

| nMOS | sMOS-p | UTMOS | WER | SECS | RTF | #Param. | |
| GT | 4.33 | - | 4.24 | 1.61 | - | - | - |
| LMVC | 3.01 | 3.17 | 3.32 | 4.17 | 0.61 | 3.891 | 305M |
| VALLE-VC | 3.56 | 3.65 | 3.63 | 3.08 | 0.55 | 3.944 | 302M |
| StyleVC | 3.24 | 3.21 | 3.41 | 5.21 | 0.43 | 0.075 | 31M |
| NS2VC | 3.32 | 3.16 | 3.49 | 4.88 | 0.44 | 0.337 | 435M |
| DDDM-VC | 3.67 | 3.61 | 3.75 | 3.07 | 0.51 | 0.287 | 66M |
| SEF-VC | 3.63 | 3.72 | 3.59 | 2.89 | 0.53 | 0.168 | 260M |
| StableVC | 3.960.04 | 4.040.05 | 4.12 | 2.03 | 0.67 | 0.146 | 166M |