notesum.ai

Published at December 5

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations

cs.CV

cs.AI

Released Date: December 5, 2024

Authors: Yongming Zhu¹, Longhao Zhang¹, Zhengkun Rong¹, Tianshu Hu¹, Shuang Liang¹, Zhipeng Ge¹

Aff.: ¹Bytedance

Arxiv: http://arxiv.org/pdf/2412.04037v1

[Uncaptioned image]

Methods	SSIM $\uparrow$	PSNR $\uparrow$	FID $\downarrow$	SyncScore $\uparrow$	LPIPS $\downarrow$	CSIM $\uparrow$	SID $\uparrow$	Var $\uparrow$
DIM [39]	0.651	20.417	34.361	4.778	0.485	0.824	0.766	0.825
INFP (Ours)	0.834	31.562	15.727	7.188	0.257	0.904	2.613	2.386
w/o Motion Memory	0.830	31.218	18.334	6.103	0.259	0.899	2.153	2.016
w/o Style Modulation	0.831	31.442	16.029	7.062	0.271	0.904	2.551	2.316
w/ Intact Image	0.802	28.488	16.990	6.812	0.266	0.842	2.470	2.148
w/ Landmarks Map	0.821	30.693	16.327	6.833	0.281	0.901	2.601	2.335
GT	1.000	N/A	0.000	7.261	0.000	0.967	2.891	2.435