notesum.ai
Published at October 30SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
cs.CV
cs.AI
cs.CL
cs.LG
cs.RO
Released Date: October 30, 2024
Authors: Yining Hong1, Beide Liu1, Maxine Wu1, Yuanhao Zhai2, Kai-Wei Chang1, Lingjie Li3, Kevin Lin3, Chung-Ching Lin3, Jianfeng Wang3, Zhengyuan Yang3, Yingnian Wu1, Lijuan Wang3
Aff.: 1UCLA; 2State University of New York at Buffalo; 3Microsoft Research

| FVD | PSNR | SSIM | LPIPS | SCuts | SRC | Human | |
|---|---|---|---|---|---|---|---|
| AVDC | 1408 | 16.96 | 52.63 | 20.65 | 3.13 | 83.89 | 0.478 |
| Streaming-T2V | 990 | 14.87 | 48.33 | 33.00 | 0.89 | 91.02 | 0.814 |
| Runway Gen-3 Turbo | 1763 | 11.15 | 47.29 | 52.71 | 2.46 | 80.26 | 0.205 |
| AnimateDiff | 782 | 17.89 | 52.34 | 33.41 | 2.94 | 90.12 | 0.872 |
| SEINE | 919 | 18.04 | 54.15 | 35.72 | 1.03 | 88.95 | 0.843 |
| iVideoGPT | 1303 | 13.08 | 31.37 | 27.22 | 1.32 | 82.19 | 0.536 |
| Ours (wo/ Temp-LoRA) | / | / | / | / | 1.88 | 89.04 | 0.869 |
| Ours SlowFast-VGen | 514 | 19.21 | 60.53 | 25.06 | 0.37 | 93.71 | 0.897 |