notesum.ai
Published at November 11Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
cs.SD
eess.AS
Released Date: November 11, 2024
Authors: Reo Yoneyama1, Atsushi Miyashita1, Ryuichi Yamamoto1, Tomoki Toda1
Aff.: 1Not specified
| Prior | Input type | Output type | Subset | VUV | RMSE | STFT | PESQ | UTMOS | |
|---|---|---|---|---|---|---|---|---|---|
| N-RI-RI | Noise | Real Imaginary | Real Imaginary | I | 10 | 0.111 | 0.711 | 3.108 | 2.038 |
| II | 34 | 0.212 | 0.821 | 2.288 | 1.465 | ||||
| III | 4 | 0.078 | 0.763 | 3.306 | 1.424 | ||||
| S-RI-RI | Sine | Real Imaginary | Real Imaginary | I | 6 | 0.085 | 0.701 | 3.733 | 2.972 |
| II | 13 | 0.127 | 0.708 | 3.589 | 3.326 | ||||
| III | 4 | 0.072 | 0.839 | 3.209 | 1.524 | ||||
| H-RI-RI | Harmonic | Real Imaginary | Real Imaginary | I | 6 | 0.081 | 0.678 | 3.818 | 3.122 |
| II | 13 | 0.114 | 0.683 | 3.702 | 3.585 | ||||
| III | 4 | 0.071 | 0.722 | 3.927 | 1.686 | ||||
| H-LA-LP | Harmonic | Log amplitude Absolute phase | Log amplitude IPW | I | 6 | 0.080 | 0.682 | 3.793 | 3.071 |
| II | 13 | 0.117 | 0.687 | 3.684 | 3.531 | ||||
| III | 4 | 0.070 | 0.714 | 3.934 | 1.686 | ||||
| H-L-LP | Harmonic | Log amplitude | Log amplitude IPW | I | 8 | 0.098 | 0.683 | 3.47 | 2.309 |
| II | 22 | 0.167 | 0.712 | 2.84 | 1.994 | ||||
| III | 4 | 0.075 | 0.720 | 3.67 | 1.524 |