notesum.ai
Published at November 7DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
cs.RO
cs.AI
Released Date: November 7, 2024
Authors: Gaoyue Zhou1, Hengkai Pan1, Yann LeCun2, Lerrel Pinto1
Aff.: 1Courant Institute, New York University; 2Courant Institute, New York University; Meta-FAIR

| LPIPS | SSIM | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | PushT | Wall | Rope | Granular | PushT | Wall | Rope | Granular |
| R3M | 0.045 | 0.0083 | 0.023 | 0.08 | 0.956 | 0.994 | 0.982 | 0.917 |
| ResNet | 0.063 | 0.0024 | 0.025 | 0.08 | 0.950 | 0.996 | 0.980 | 0.915 |
| DinoCLS | 0.039 | 0.004 | 0.029 | 0.086 | 0.973 | 0.996 | 0.980 | 0.912 |
| AVDC | 0.046 | 0.030 | 0.060 | 0.106 | 0.959 | 0.983 | 0.979 | 0.909 |
| Ours | 0.007 | 0.0016 | 0.009 | 0.035 | 0.985 | 0.997 | 0.985 | 0.940 |