notesum.ai

Published at November 7

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

cs.RO

cs.AI

Released Date: November 7, 2024

Authors: Gaoyue Zhou¹, Hengkai Pan¹, Yann LeCun², Lerrel Pinto¹

Aff.: ¹Courant Institute, New York University; ²Courant Institute, New York University; Meta-FAIR

Arxiv: http://arxiv.org/abs/2411.04983v1

Refer to caption

	LPIPS $\downarrow$				SSIM $\uparrow$
Method	PushT	Wall	Rope	Granular	PushT	Wall	Rope	Granular
R3M	0.045	0.0083	0.023	0.08	0.956	0.994	0.982	0.917
ResNet	0.063	0.0024	0.025	0.08	0.950	0.996	0.980	0.915
DinoCLS	0.039	0.004	0.029	0.086	0.973	0.996	0.980	0.912
AVDC	0.046	0.030	0.060	0.106	0.959	0.983	0.979	0.909
Ours	0.007	0.0016	0.009	0.035	0.985	0.997	0.985	0.940