notesum.ai
Published at December 5NaVILA: Legged Robot Vision-Language-Action Model for Navigation
cs.RO
cs.CV
Released Date: December 5, 2024
Authors: An-Chieh Cheng1, Yandong Ji1, Zhaojing Yang2, Xueyan Zou1, Jan Kautz3, Erdem Bıyık2, Hongxu Yin3, Sifei Liu3, Xiaolong Wang3
Aff.: 1UC San Diego; 2USC; 3NVIDIA

| Observation | R2R Val-Unseen | RxR Val-Unseen | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S.RGB | Pano. | Depth | Odo. | NE | OS | SR | SPL | NE | SR | SPL | nDTW | |||
| HPN+DN∗ (Krantz et al., 2021) | ✓ | ✓ | ✓ | 6.31 | 40.0 | 36.0 | 34.0 | - | - | - | - | |||
| CMA∗ (Hong et al., 2022) | ✓ | ✓ | ✓ | 6.20 | 52.0 | 41.0 | 36.0 | 8.76 | 26.5 | 22.1 | 47.0 | |||
| VLNBERT∗ (Hong et al., 2022) | ✓ | ✓ | ✓ | 5.74 | 53.0 | 44.0 | 39.0 | 8.98 | 27.0 | 22.6 | 46.7 | |||
| Sim2Sim∗ (Krantz & Lee, 2022) | ✓ | ✓ | ✓ | 6.07 | 52.0 | 43.0 | 36.0 | - | - | - | - | |||
| GridMM∗ (Wang et al., 2023c) | ✓ | ✓ | ✓ | 5.11 | 61.0 | 49.0 | 41.0 | - | - | - | - | |||
| Ego2-Map∗ (Hong et al., 2023a) | ✓ | ✓ | ✓ | 5.54 | 56.0 | 47.0 | 41.0 | - | - | - | - | |||
| DreamWalker∗ (Wang et al., 2023a) | ✓ | ✓ | ✓ | 5.53 | 59.0 | 49.0 | 44.0 | - | - | - | - | |||
| Reborn∗ (An et al., 2022) | ✓ | ✓ | ✓ | 5.40 | 57.0 | 50.0 | 46.0 | 5.98 | 48.6 | 42.0 | 63.3 | |||
| ETPNav∗ (An et al., 2024) | ✓ | ✓ | ✓ | 4.71 | 65.0 | 57.0 | 49.0 | 5.64 | 54.7 | 44.8 | 61.9 | |||
| HNR∗ (Wang et al., 2024) | ✓ | ✓ | ✓ | 4.42 | 67.0 | 61.0 | 51.0 | 5.50 | 56.3 | 46.7 | 63.5 | |||
| BEVBert∗ (An et al., 2023) | ✓ | ✓ | ✓ | 4.57 | 67.0 | 59.0 | 50.0 | 4.00 | 68.5 | - | 69.6 | |||
| HAMT+ScaleVLN∗ (Wang et al., 2023d) | ✓ | ✓ | ✓ | 4.80 | - | 55.0 | 51.0 | - | - | - | - | |||
| AG-CMTP (Chen et al., 2021a) | ✓ | ✓ | ✓ | 7.90 | 39.0 | 23.0 | 19.0 | - | - | - | - | |||
| R2R-CMTP (Chen et al., 2021a) | ✓ | ✓ | ✓ | 7.90 | 38.0 | 26.0 | 22.0 | - | - | - | - | |||
| LAW (Raychaudhuri et al., 2021) | ✓ | ✓ | ✓ | 6.83 | 44.0 | 35.0 | 31.0 | 10.90 | 8.0 | 8.0 | 38.0 | |||
| CM2 (Georgakis et al., 2022) | ✓ | ✓ | ✓ | 7.02 | 41.0 | 34.0 | 27.0 | - | - | - | - | |||
| WS-MGMap (Chen et al., 2022) | ✓ | ✓ | ✓ | 6.28 | 47.0 | 38.0 | 34.0 | - | - | - | - | |||
| AO-Planner (Chen et al., 2024c) | ✓ | ✓ | 5.55 | 59.0 | 47.0 | 33.0 | 7.06 | 43.3 | 30.5 | 50.1 | ||||
| Seq2Seq (Krantz et al., 2020a) | ✓ | ✓ | 7.77 | 37.0 | 25.0 | 22.0 | 12.10 | 13.9 | 11.9 | 30.8 | ||||
| CMA (Krantz et al., 2020a) | ✓ | ✓ | 7.37 | 40.0 | 32.0 | 30.0 | - | - | - | - | ||||
| RGB-Seq2Seq (Krantz et al., 2020a) | ✓ | 10.10 | 8.0 | 0.0 | 0.0 | - | - | - | - | |||||
| RGB-CMA (Krantz et al., 2020a) | ✓ | 9.55 | 10.0 | 5.0 | 4.0 | - | - | - | - | |||||
| NaVid (Zhang et al., 2024) | ✓ | 5.47 | 49.0 | 37.0 | 35.0 | - | - | - | - | |||||
| NaVILA | ✓ | 5.22 | 62.5 | 54.0 | 49.0 | 6.77 | 49.3 | 44.0 | 58.8 | |||||