notesum.ai
Published at December 9Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation
cs.CV
cs.MM
Released Date: December 9, 2024
Authors: Xuesong Zhang1, Yunbo Xu1, Jia Li1, Zhenzhen Hu1, Richnag Hong1
Aff.: 1Hefei University of Technology

| Methods | Validation Seen | Validation Unseen | Test Unseen | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TL | SR | SPL | RGS | RGSPL | TL | SR | SPL | RGS | RGSPL | TL | SR | SPL | RGS | RGSPL | |
| RCM [56] | 10.70 | 23.33 | 21.82 | 16.23 | 15.36 | 11.98 | 9.29 | 6.97 | 4.89 | 3.89 | 10.60 | 7.84 | 6.67 | 3.67 | 3.14 |
| HOP+ [42] | 10.59 | 55.87 | 49.55 | 40.76 | 36.22 | 14.57 | 36.07 | 31.13 | 22.49 | 19.33 | 15.17 | 33.82 | 28.24 | 20.20 | 16.86 |
| DUET [5] | 13.86 | 71.75 | 63.94 | 57.41 | 51.14 | 22.11 | 46.98 | 33.73 | 32.15 | 23.03 | 21.30 | 52.51 | 36.06 | 31.88 | 22.06 |
| DSRG [53] | - | 75.69 | 68.09 | 61.07 | 54.72 | - | 47.83 | 34.02 | 32.69 | 23.37 | - | 54.04 | 37.09 | 32.49 | 22.18 |
| GridMM [58] | - | - | - | - | - | 23.20 | 51.37 | 36.47 | 34.57 | 24.56 | 19.97 | 53.13 | 36.60 | 34.87 | 23.45 |
| AZHP [60] | 13.60 | 70.98 | 62.24 | 56.99 | 50.14 | 22.08 | 49.02 | 36.25 | 32.41 | 24.13 | 21.10 | 52.52 | 36.11 | 32.10 | 22.54 |
| FDA [17] | - | - | - | - | - | 19.04 | 47.57 | 35.90 | 32.06 | 24.31 | 17.30 | 49.62 | 36.45 | 30.34 | 22.08 |
| CONSOLE [30] | - | 74.14 | 65.15 | 60.08 | 52.69 | - | 50.07 | 34.40 | 34.05 | 23.33 | - | 55.13 | 37.13 | 33.18 | 22.25 |
| KERM [26] | 14.25 | 71.89 | 64.04 | 57.55 | 51.22 | 21.85 | 49.02 | 34.83 | 33.97 | 24.14 | 18.38 | 52.26 | 37.46 | 32.69 | 23.15 |
| VER [34] | 16.13 | 75.83 | 66.19 | 61.71 | 56.20 | 23.03 | 55.98 | 39.66 | 33.71 | 23.70 | 24.74 | 56.82 | 38.76 | 33.88 | 23.19 |
| SUSA (Ours) | 14.60 | 76.95 | 69.07 | 61.77 | 55.86 | 22.59 | 51.75 | 38.86 | 35.02 | 26.56 | 17.86 | 54.39 | 41.54 | 36.11 | 27.31 |