notesum.ai
Published at November 4How Far is Video Generation from World Model: A Physical Law Perspective
cs.CV
cs.AI
Released Date: November 4, 2024
Authors: Bingyi Kang1, Yang Yue2, Rui Lu2, Zhijie Lin1, Yang Zhao1, Kaixin Wang3, Gao Huang2, Jiashi Feng1
Aff.: 1Bytedance Research; 2Tsinghua University; 3Technion

| Model | Layers | Hidden size | Heads | #Param |
|---|---|---|---|---|
| DiT-S | 12 | 384 | 6 | 22.5M |
| DiT-B | 12 | 768 | 12 | 89.5M |
| DiT-L | 24 | 1024 | 16 | 310.0M |
| DiT-XL | 28 | 1152 | 16 | 456.0M |