notesum.ai
Published at December 4Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
cs.CV
Released Date: December 4, 2024
Authors: Hannan Lu1, Xiaohe Wu, Shudong Wang, Xiameng Qin, Xinyu Zhang, Junyu Han, Wangmeng Zuo, Ji Tao
Aff.: 1Harbin Institute of Technology

| Generation quality | Controllability | |||||||
| BEV segmentation | 3D object detection | |||||||
| Method | M-View | M-Frame | FID () | FVD () | Road mIoU () | Vehicle mIoU () | Drivable mIoU () | Object NDS() |
| Oracle | 71.6 | 35.8 | 81.7 | 41.2 | ||||
| BEVGen [27] | ✓ | 25.5 | - | 50.2 (-21.4%) | 5.9 (-29.9%) | - | - | |
| BEVControl [36] | ✓ | 24.9 | - | 60.8 (-10.8%) | 26.8 (-9.0%) | - | - | |
| MagicDrive [8] | ✓ | 16.2 | - | 61.1 (-10.5%) | 27.0 (-8.8%) | - | 30.6 (-10.6%) | |
| DriveDreamer [31] | ✓ | 52.6 | 452.0 | - | - | - | - | |
| Panacea [34] | ✓ | ✓ | 17.0 | 139.0 | - | - | - | - |
| DrivingDiffusion [16] | ✓ | ✓ | 15.8 | 346.0 | 63.2 (-8.4%) | 31.6 (-4.2%) | 67.8 (-13.9%) | 33.1 (-8.1%) |
| MagicDrive-V [7] | ✓ | ✓ | 20.7 | 164.7 | 40.0 (-31.6%) | 22.9 (-12.9%) | 38.6 (-43.1%) | 19.6 (-21.6%) |
| CogDriving (Ours) | ✓ | ✓ | 15.3 | 37.8 | 65.7 (-3.7%) | 32.1 (-3.7%) | 71.9 (-9.8%) | 34.3 (-6.9%) |