notesum.ai
Published at December 4Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
cs.RO
cs.CV
Released Date: December 4, 2024
Authors: Junjie Wen1, Minjie Zhu2, Yichen Zhu3, Zhibin Tang, Jinming Li, Zhongyi Zhou, Chengmeng Li, Xiaoyu Liu, Yaxin Peng, Chaomin Shen, Feifei Feng
Aff.: 1East China Normal University; 2Midea Group; 3Shanghai University

| Pre-trained | In-Distribution | Visual Generalization | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model Tasks | Trajectory | Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | Avg. | Task 1 | Task 2 | Task 3 | Task 4 | Task 5 | Avg. |
| Diffusion Policy [10] | - | 66.7 | 36.4 | 0 | 36.4 | 0 | 27.9 | 11.1 | 11.1 | 0 | 22.2 | 0 | 8.9 |
| ScalingDP-1B [74] | - | 66.7 | 36.4 | 27.3 | 37.0 | 18.2 | 35.2 | 33.3 | 33.3 | 0 | 22.2 | 22.2 | 22.2 |
| TinyVLA [57] | - | 72.7 | 45.5 | 36.4 | 45.5 | 27.3 | 45.5 | 44.4 | 44.4 | 11.1 | 22.2 | 22.2 | 28.9 |
| Octo [34] | 970K | 57.6 | 27.3 | 9.1 | 0 | 27.3 | 24.3 | 44.4 | 11.1 | 11.1 | 0 | 22.2 | 17.8 |
| OpenVLA-7B [26] | 970K | 69.7 | 18.2 | 18.2 | 36.4 | 54.5 | 39.4 | 55.6 | 11.1 | 0 | 33.3 | 33.3 | 26.7 |
| DiVLA-2B | 39K | 100 | 100 | 63.6 | 63.6 | 90.9 | 83.6 | 44.4 | 66.7 | 44.4 | 66.7 | 66.7 | 57.8 |
| DiVLA-7B | 39K | 100 | 100 | 72.7 | 81.8 | 100 | 90.9 | 66.7 | 88.9 | 55.6 | 77.8 | 66.7 | 71.0 |