notesum.ai

Published at December 3

It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model

cs.SD

cs.CV

cs.GR

cs.MM

eess.AS

Released Date: December 3, 2024

Authors: Mingyi Shi¹, Dafei Qin¹, Leo Ho¹, Zhouyingcheng Liao¹, Yinghao Huang², Junichi Yamagishi³, Taku Komura¹

Aff.: ¹The University of Hong Kong; ²Great Bay University; ³National Institute of Informatics, Tokyo

Arxiv: http://arxiv.org/pdf/2412.02419v1

[Uncaptioned image]

	Motion Quality			Interaction
	FPD $\downarrow$	Div. $\uparrow$	Foot.Slid. $\downarrow$	FDD $\downarrow$	Div. $\uparrow$
ReMoS [19]	475.32	111.90	0.3050	394.7	34.0
Ours	103.19	10.42	0.0109	133.72	14.13
Ours(w/t audio)	86.79	14.84	0.0141	104.43	19.90