notesum.ai
Published at December 3VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
cs.CV
cs.AI
Released Date: December 3, 2024
Authors: Mingzhe Zheng1, Yongqi Xu2, Haojian Huang3, Xuran Ma1, Yexin Liu1, Wenjie Shu1, Yatian Pang4, Feilong Tang1, Qifeng Chen1, Harry Yang1, Ser-Nam Lim5
Aff.: 1Hong Kong University of Science and Technology; 2Peking University; 3University of Hong Kong; 4National University of Singapore; 5University of Central Florida
![[Uncaptioned image]](https://arxiv.org/html/2412.02259v1/x1.png)
| Model | CLIP (pcha) | CLIP (pb) | CLIP (pr) | CLIP (pcam) | CLIP (ph) | FC (Within-Shot) | FC (Cross-Shot) | SC (Within-Shot) | SC (Cross-Shot) |
|---|---|---|---|---|---|---|---|---|---|
| EasyAnimate [69] | 0.4086 | 0.2429 | 0.1633 | 0.1130 | 0.0722 | 0.4705 | 0.0268 | 0.7969 | 0.2037 |
| CogVideo [71] | 0.4113 | 0.2432 | 0.1632 | 0.1122 | 0.0701 | 0.6099 | 0.0222 | 0.7424 | 0.2069 |
| VideoCrafter1 [8] | 0.4365 | 0.2417 | 0.1535 | 0.1032 | 0.0651 | 0.3706 | 0.0350 | 0.7623 | 0.1867 |
| VideoCrafter2 [9] | 0.4000 | 0.2511 | 0.1654 | 0.1140 | 0.0694 | 0.5569 | 0.0686 | 0.7981 | 0.1798 |
| VGoT | 0.4086 | 0.2429 | 0.1633 | 0.1130 | 0.0722 | 0.8138 | 0.2688 | 0.9717 | 0.4276 |