notesum.ai
Published at December 54Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
cs.CV
Released Date: December 5, 2024
Authors: Chaoyang Wang1, Peiye Zhuang1, Tuan Duc Ngo2, Willi Menapace1, Aliaksandr Siarohin1, Michael Vasilkovsky1, Ivan Skorokhodov1, Sergey Tulyakov1, Peter Wonka3, Hsin-Ying Lee1
Aff.: 1Snap Inc; 2Snap Inc, Umass Amherst; 3Snap Inc, KAUST

| Method | FID | CLIP | FVD | FVD-Test | Visual Quality | Temporal Consist. | Factual Consist. | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Time | View | Time | View | Time | View | Time | View | Time | View | |||
| SV4D [44] | 204.81 | 19.46 | 1053.10 | 1245.42 | 814.50 | 323.99 | 2.26 | 2.02 | 2.03 | 1.68 | 2.12 | 1.99 |
| MotionCtrl [42] | 87.10 | 20.20 | 1556.36 | 1509.76 | 1170.04 | 302.18 | 2.36 | 2.30 | 2.38 | 2.25 | 2.38 | 2.33 |
| Sequential | 96.64 | 28.16 | 1662.54 | 1797.15 | 897.08 | 597.19 | 2.30 | 2.28 | 2.21 | 2.15 | 2.23 | 2.20 |
| Soft w/o Obj | 80.17 | 28.11 | 1392.48 | 1720.47 | 318.18 | 302.18 | 2.41 | 2.39 | 2.37 | 2.31 | 2.35 | 2.33 |
| Hard Sync | 79.92 | 28.16 | 972.87 | 1045.35 | 316.14 | 251.44 | 2.42 | 2.40 | 2.40 | 2.33 | 2.37 | 2.34 |
| Soft Sync | 78.36 | 28.22 | 906.16 | 1036.00 | 308.15 | 261.02 | 2.43 | 2.42 | 2.41 | 2.36 | 2.38 | 2.36 |