notesum.ai
Published at December 5INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
cs.CV
cs.AI
Released Date: December 5, 2024
Authors: Yongming Zhu1, Longhao Zhang1, Zhengkun Rong1, Tianshu Hu1, Shuang Liang1, Zhipeng Ge1
Aff.: 1Bytedance
![[Uncaptioned image]](https://arxiv.org/html/2412.04037v1/extracted/6046866/figs/teaser-v3.2.png)
| Methods | SSIM | PSNR | FID | SyncScore | LPIPS | CSIM | SID | Var |
|---|---|---|---|---|---|---|---|---|
| DIM [39] | 0.651 | 20.417 | 34.361 | 4.778 | 0.485 | 0.824 | 0.766 | 0.825 |
| INFP (Ours) | 0.834 | 31.562 | 15.727 | 7.188 | 0.257 | 0.904 | 2.613 | 2.386 |
| w/o Motion Memory | 0.830 | 31.218 | 18.334 | 6.103 | 0.259 | 0.899 | 2.153 | 2.016 |
| w/o Style Modulation | 0.831 | 31.442 | 16.029 | 7.062 | 0.271 | 0.904 | 2.551 | 2.316 |
| w/ Intact Image | 0.802 | 28.488 | 16.990 | 6.812 | 0.266 | 0.842 | 2.470 | 2.148 |
| w/ Landmarks Map | 0.821 | 30.693 | 16.327 | 6.833 | 0.281 | 0.901 | 2.601 | 2.335 |
| GT | 1.000 | N/A | 0.000 | 7.261 | 0.000 | 0.967 | 2.891 | 2.435 |