notesum.ai
Published at November 20Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis
cs.SD
cs.AI
cs.HC
eess.AS
Released Date: November 20, 2024
Authors: Pegah Salehi, Sajad Amouei Sheshkal, Vajira Thambawita, Sushant Gautam, Saeed S. Sabet, Dag Johansen, Michael A. Riegler, Pål Halvorsen

| Methods | AFE | Dataset | PSNR | SSIM | LPIPS | LMD | FID | AUE | Syncconf |
|---|---|---|---|---|---|---|---|---|---|
| RAD-NeRF [26] | Deep-Speech | Obama | 27.14 | 0.9304 | 0.0738 | 2.675 | 31.29 | 1.995 | 7.171 |
| Donya | 27.79 | 0.9045 | 0.0917 | 2.750 | 12.82 | 1.911 | 4.720 | ||
| Shaheen | 30.13 | 0.9314 | 0.0697 | 3.199 | 33.05 | 2.837 | 7.330 | ||
| Mean | 28.35 | 0.9221 | 0.0784 | 2.874 | 25.72 | 2.247 | 6.407 | ||
| HuBERT | Obama | 26.58 | 0.9261 | 0.0769 | 2.762 | 28.78 | 2.006 | 0.563 | |
| Donya | 28.05 | 0.9071 | 0.0868 | 2.518 | 14.25 | 2.511 | 0.365 | ||
| Shaheen | 30.45 | 0.9332 | 0.0729 | 3.050 | 35.96 | 3.229 | 0.494 | ||
| Mean | 28.36 | 0.9221 | 0.0788 | 2.776 | 26.33 | 2.582 | 0.474 | ||
| Wav2Vec | Obama | 26.59 | 0.9268 | 0.0785 | 2.696 | 15.15 | 1.707 | 6.744 | |
| Donya | 27.12 | 0.8972 | 0.0845 | 2.726 | 24.58 | 1.531 | 4.820 | ||
| Shaheen | 30.08 | 0.9306 | 0.0698 | 3.221 | 34.77 | 2.966 | 7.946 | ||
| Mean | 27.93 | 0.9182 | 0.0776 | 2.881 | 24.83 | 2.068 | 6.503 | ||
| Whisper | Obama | 26.10 | 0.9231 | 0.0723 | 2.573 | 12.67 | 1.693 | 7.143 | |
| Donya | 28.65 | 0.9138 | 0.0844 | 2.640 | 26.85 | 1.504 | 5.269 | ||
| Shaheen | 30.05 | 0.9303 | 0.0660 | 3.045 | 29.44 | 2.696 | 8.488 | ||
| Mean | 28.07 | 0.9224 | 0.0761 | 2.826 | 24.04 | 1.964 | 6.966 | ||
| ER-NeRF [28] | Deep-Speech | Obama | 26.44 | 0.9339 | 0.0441 | 2.561 | 7.14 | 1.923 | 7.201 |
| Donya | 28.91 | 0.9165 | 0.0605 | 2.647 | 14.59 | 1.874 | 4.722 | ||
| Shaheen | 29.92 | 0.9267 | 0.0450 | 2.900 | 16.10 | 2.668 | 8.215 | ||
| Mean | 28.16 | 0.9257 | 0.0689 | 2.7932 | 20.92 | 2.155 | 6.712 | ||
| HuBERT | Obama | 26.30 | 0.9297 | 0.0473 | 2.758 | 8.33 | 1.711 | 0.300 | |
| Donya | 24.20 | 0.7826 | 0.1255 | 2.545 | 49.81 | 2.284 | 0.408 | ||
| Shaheen | 30.45 | 0.9322 | 0.0420 | 2.852 | 16.56 | 3.172 | 0.434 | ||
| Mean | 26.98 | 0.8815 | 0.0716 | 2.718 | 24.90 | 2.389 | 0.380 | ||
| Wav2Vec | Obama | 25.59 | 0.9268 | 0.0497 | 2.645 | 8.83 | 1.704 | 6.616 | |
| Donya | 24.21 | 0.7777 | 0.1509 | 2.754 | 68.20 | 1.730 | 4.403 | ||
| Shaheen | 29.81 | 0.9245 | 0.0470 | 3.003 | 15.59 | 2.948 | 7.917 | ||
| Mean | 26.53 | 0.8763 | 0.0825 | 2.800 | 30.87 | 2.127 | 6.312 | ||
| Whisper | Obama | 26.30 | 0.9314 | 0.0462 | 2.501 | 8.06 | 1.797 | 7.647 | |
| Donya | 27.36 | 0.9020 | 0.0641 | 2.516 | 14.67 | 1.852 | 5.704 | ||
| Shaheen | 30.20 | 0.9305 | 0.0434 | 2.935 | 15.61 | 3.030 | 8.575 | ||
| Mean | 28.12 | 0.9213 | 0.0654 | 2.7640 | 19.29 | 2.226 | 7.308 |