notesum.ai
Published at December 10Multi-Shot Character Consistency for Text-to-Video Generation
cs.CV
Released Date: December 10, 2024
Authors: Yuval Atzmon1, Rinon Gal, Yoad Tewel, Yoni Kasten, Gal Chechik
Aff.: 1NVIDIA

| Multi-Shot Consistency | Text Similarity | Dynamic Degree | Subject Consistency | |
| ConsiS Im2Vid | 63.7 1.4 | 27.3 0.5 | 3.3 1.5 | 99.1 0.1 |
| VSTAR | 83.9 1.6 | 19.8 0.4 | 90.7 2.4 | 92.6 0.3 |
| Tokenflow | 65.3 1.5 | 27.9 0.4 | 26.0 3.6 | 97.7 0.2 |
| VideoCrafter2 | 63.2 1.7 | 28.7 0.4 | 29.3 3.7 | 97.3 0.2 |
| Ours + VideoCrafter2 | 68.8 1.8 | 27.7 0.4 | 20.0 3.3 | 97.7 0.2 |
| Turbo-V2 | 63.3 1.7 | 28.6 0.4 | 63.3 3.9 | 96.2 0.2 |
| Ours + Turbo-V2 | 67.3 2.1 | 27.4 0.4 | 62.0 4.0 | 96.8 0.2 |