notesum.ai

Published at December 4

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

cs.CV

Released Date: December 4, 2024

Authors: Jiahao Lu¹, Tianyu Huang², Peng Li¹, Zhiyang Dou³, Cheng Lin, Zhiming Cui⁴, Zhen Dong⁵, Sai-Kit Yeung¹, Wenping Wang⁶, Yuan Liu⁷

Aff.: ¹HKUST; ²CUHK; ³HKU; ⁴ShanghaiTech; ⁵WHU; ⁶TAMU; ⁷NTU

Arxiv: http://arxiv.org/pdf/2412.03079v1

[Uncaptioned image]

Category	Method	Indoors & outdoors (Hard)						Indoors (Easy)
		Sintel		PointOdyssey val		FlyingThings3D test		Bonn 5 scenes		TUM dynamics
		Abs Rel $\downarrow$	$\delta<1.25\uparrow$	Abs Rel $\downarrow$	$\delta<1.25\uparrow$	Abs Rel $\downarrow$	$\delta<1.25\uparrow$	Abs Rel $\downarrow$	$\delta<1.25\uparrow$	Abs Rel $\downarrow$	$\delta<1.25\uparrow$
Single-frame depth	Depth Anything V2 [54]	0.348	0.592	0.214	0.688	0.267	0.616	0.118	0.882	0.184	0.750
Single-frame depth	Depth Pro [4]	0.418	0.559	0.167	0.779	0.322	0.537	0.067	0.974	0.106	0.887
Video depth	ChronoDepth [36]	0.687	0.486	0.210	0.707	0.288	0.633	0.100	0.911	0.151	0.825
Video depth	DepthCrafter [15]	0.292	0.697	0.229	0.675	/	/	0.075	0.971	0.176	0.744
	DUSt3R [44]	0.422	0.542	0.184	0.743	0.140	0.817	0.154	0.839	0.202	0.775
Joint video depth	MonST3R [60]	0.335	0.586	0.089	0.909	0.132	0.836	0.082	0.953	0.140	0.841
depth & pose	Ours (Depth Anything V2)	0.253	0.681	0.078	0.929	0.106	0.890	0.075	0.972	0.109	0.915
	Ours (Depth Pro)	0.263	0.641	0.077	0.930	0.102	0.895	0.068	0.969	0.112	0.884