notesum.ai
Published at November 27Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
cs.CV
Released Date: November 27, 2024
Authors: Yueru Jia1, Jiaming Liu2, Sixiang Chen2, Chenyang Gu2, Zhilue Wang2, Longzan Luo2, Lily Lee2, Pengwei Wang3, Zhongyuan Wang3, Renrui Zhang4, Shanghang Zhang1
Aff.: 1State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence (BAAI); 2State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University; 3Beijing Academy of Artificial Intelligence (BAAI); 4CUHK
![[Uncaptioned image]](https://arxiv.org/html/2411.18623v1/x1.png)
| MetaWorld | Adroit | ||||||||||
| Method | Type | Input Type | Easy (7) | Medium (5) | Hard (2) | Very Hard (1) | Mean S.R. | Hammer | Door | Pen | Mean S.R. |
| CLIP [61] | 2D Rep. | RGB | 72.9 | 68.8 | 50.0 | 26.0 | 65.3 | 100 | 100 | 52 | 84.0 |
| R3M [54] | RGB | 84.6 | 66.4 | 83.0 | 36.0 | 75.1 | 100 | 100 | 56 | 85.3 | |
| VC-1 [52] | RGB | 62.6 | 71.6 | 52.0 | 12.0 | 60.8 | 88 | 100 | 48 | 78.7 | |
| PointNet [57] | 3D Rep. | PC | 72.0 | 37.6 | 66.0 | 14.0 | 55.9 | 60 | 100 | 48 | 69.3 |
| PointNet++ [58] | PC | 70.3 | 59.6 | 61.0 | 12.0 | 61.6 | 68 | 100 | 60 | 76.0 | |
| PointNeXt [59] | PC | 82.6 | 62.8 | 59.0 | 20.0 | 68.7 | 52 | 96 | 48 | 65.3 | |
| SPA [96] | RGB | 72.6 | 77.4 | 66.0 | 16.0 | 69.5 | 100 | 100 | 44 | 81.3 | |
| DP3 [86] | 3D Policy | PC | 85.7 | 49.6 | 57.0 | 18.0 | 65.3 | 88 | 100 | 12 | 66.7 |
| Lift3D (DINOV2) | Ours | PC | 93.1 | 82.4 | 88.0 | 28.0 | 84.5 | 100 | 100 | 56 | 85.3 |
| Lift3D (CLIP) | PC | 94.0 | 78.8 | 82.0 | 42.0 | 83.9 | 100 | 100 | 64 | 88.0 | |