notesum.ai
Published at December 5Four-Plane Factorized Video Autoencoders
cs.CV
Released Date: December 5, 2024
Authors: Mohammed Suhail1, Carlos Esteves1, Leonid Sigal2, Ameesh Makadia1
Aff.: 1Google; 2University of British Columbia

| Class-conditional generation FVD | Frame prediction FVD | Params. | Steps | |
|---|---|---|---|---|
| (UCF dataset) | (Kinetics-600 dataset) | |||
| Video Diffusion [24] | - | 16.2 | 1.1B | 256 |
| RIN [27] | - | 10.7 | 411M | 1000 |
| TATS [17] | 332 | - | 321M | 1024 |
| Phenaki [48] | - | 36.4 | 227M | 48 |
| MAGVIT [51] | 76 | 9.9 | 306M | 12 |
| MAGVIT-v2 [53] | 58 | 4.3 | 307M | 24 |
| W.A.L.T [20] | 46 | TML]B3E5FC3.3 | 313M | 50 |
| W.A.L.T* [20] | TML]E3F6FF39 | TML]E3F6FF5.7 | 313M | 50 |
| Ours | TML]B3E5FC38 | 8.6 | 214M | 50 |