notesum.ai
Published at November 25Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
cs.MM
cs.GR
cs.SD
eess.AS
Released Date: November 25, 2024
Authors: Xiaozhong Ji1, Xiaobin Hu1, Zhihong Xu2, Junwei Zhu1, Chuming Lin1, Qingdong He1, Jiangning Zhang1, Donghao Luo1, Yi Chen1, Qin Lin1, Qinglin Lu1, Chengjie Wang1
Aff.: 1Tencent; 2Zhejiang University
![[Uncaptioned image]](https://arxiv.org/html/2411.16331v1/x1.png)
| Method / Metric | Lip sync | Motion diversity | ID consistency | Video Smoothness |
| Aniportrait | 1.42 | 1.62 | 3.11 | 2.09 |
| SadTalker | 1.98 | 2.34 | 2.95 | 2.95 |
| Echomimic | 2.77 | 2.65 | 3.48 | 2.71 |
| Hallo2 | 3.15 | 2.37 | 3.34 | 2.94 |
| Sonic(Ours) | 4.58 (45%) | 4.55 (72%) | 4.29 (23%) | 4.66 (58%) |