notesum.ai

Published at December 4

Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention

cs.CV

Released Date: December 4, 2024

Authors: Hannan Lu¹, Xiaohe Wu, Shudong Wang, Xiameng Qin, Xinyu Zhang, Junyu Han, Wangmeng Zuo, Ji Tao

Aff.: ¹Harbin Institute of Technology

Arxiv: http://arxiv.org/pdf/2412.03520v1

Refer to caption

	Generation quality				Controllability
					BEV segmentation		3D object detection
Method	M-View	M-Frame	FID ( $\downarrow$ )	FVD ( $\downarrow$ )	Road mIoU ( $\uparrow$ )	Vehicle mIoU ( $\uparrow$ )	Drivable mIoU ( $\uparrow$ )	Object NDS( $\uparrow$ )
Oracle					71.6	35.8	81.7	41.2
BEVGen [27]	✓		25.5	-	50.2 (-21.4%)	5.9 (-29.9%)	-	-
BEVControl [36]	✓		24.9	-	60.8 (-10.8%)	26.8 (-9.0%)	-	-
MagicDrive [8]	✓		16.2	-	61.1 (-10.5%)	27.0 (-8.8%)	-	30.6 (-10.6%)
DriveDreamer [31]		✓	52.6	452.0	-	-	-	-
Panacea [34]	✓	✓	17.0	139.0	-	-	-	-
DrivingDiffusion [16]	✓	✓	15.8	346.0	63.2 (-8.4%)	31.6 (-4.2%)	67.8 (-13.9%)	33.1 (-8.1%)
MagicDrive-V [7]	✓	✓	20.7	164.7	40.0 (-31.6%)	22.9 (-12.9%)	38.6 (-43.1%)	19.6 (-21.6%)
CogDriving (Ours)	✓	✓	15.3	37.8	65.7 (-3.7%)	32.1 (-3.7%)	71.9 (-9.8%)	34.3 (-6.9%)