notesum.ai
Published at November 29Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis
cs.CV
cs.LG
cs.SD
eess.AS
Released Date: November 29, 2024
Authors: Tianqi Li1, Ruobing Zheng1, Minghui Yang1, Jingdong Chen1, Ming Yang1
Aff.: 1Ant Group

| Method | FID | FVD | CSIM | Sync-C | Sync-D | RTF |
|---|---|---|---|---|---|---|
| GT | - | - | - | 8.044 | 6.943 | - |
| MuseTalk | 21.445 | 436.862 | 0.807 | 5.586 | 8.400 | 2.248 |
| EchoMimic | 42.554 | 395.754 | 0.840 | 5.733 | 9.204 | 35.528 |
| Hallo | 22.996 | 271.680 | 0.812 | 7.652 | 7.590 | 53.082 |
| Hallo2 | 22.899 | 245.236 | 0.806 | 7.737 | 7.608 | 56.838 |
| Ours-s50 | 17.254 | 219.368 | 0.864 | 8.069 | 7.114 | 2.121 |
| Ours-s10 | 17.060 | 231.182 | 0.861 | 8.111 | 7.291 | 0.635 |