notesum.ai
Published at December 5Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
cs.RO
cs.AI
cs.CL
cs.CV
cs.LG
Released Date: December 5, 2024
Authors: Yi Chen1, Yuying Ge2, Yizhuo Li1, Yixiao Ge2, Mingyu Ding3, Ying Shan2, Xihui Liu1
Aff.: 1The University of Hong Kong; 2ARC Lab, Tencent PCG; 3University of California, Berkeley
![[Uncaptioned image]](https://arxiv.org/html/2412.04445v1/x1.png)
| Method | Pick Coke Can | Move Near | Open / Close Drawer | Overall | |||||
| Horizontal | Vertical | Standing | Average | Average | Open | Close | Average | Average | |
| RT-1-X [5] | 0.820 | 0.330 | 0.550 | 0.567 | 0.317 | 0.296 | 0.891 | 0.597 | 0.534 |
| RT-2-X [62] | 0.740 | 0.740 | 0.880 | 0.787 | 0.779 | 0.157 | 0.343 | 0.250 | 0.607 |
| Octo-Base [42] | 0.210 | 0.210 | 0.090 | 0.170 | 0.042 | 0.009 | 0.444 | 0.227 | 0.169 |
| OpenVLA [28] | 0.270 | 0.030 | 0.190 | 0.163 | 0.462 | 0.194 | 0.518 | 0.356 | 0.248 |
| Moto | 0.820 | 0.500 | 0.900 | 0.740 | 0.604 | 0.130 | 0.732 | 0.431 | 0.614 |
| Moto w/o Motion Token | 0.600 | 0.190 | 0.740 | 0.503 | 0.554 | 0.000 | 0.796 | 0.398 | 0.480 |