notesum.ai
Published at December 3RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
cs.CV
Released Date: December 3, 2024
Authors: Changli Wu1, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji
Aff.: 1Xiamen University

| Unique (19%) | Multiple (81%) | Overall | Inference Time | |||||||||
| Method | 0.25 | 0.5 | mIoU | 0.25 | 0.5 | mIoU | 0.25 | 0.5 | mIoU | Stage-1 | Stage-2 | All |
| Multi-task | ||||||||||||
| EDA-box2mask [68] | 84.7 | 56.9 | - | 50.0 | 37.0 | - | 55.2 | 40.0 | 35.0 | - | - | - |
| 3DRefTR-SP [41] | 87.9 | 69.8 | - | 51.6 | 41.9 | - | 57.0 | 46.1 | 40.8 | - | - | 388ms |
| 3DRefTR-HR [41] | 89.6 | 77.0 | - | 52.3 | 43.7 | - | 57.9 | 48.7 | 41.2 | - | - | 405ms |
| UniSeg3D [69] | - | - | - | - | - | - | - | - | 29.6 | - | - | - |
| SegPoint [20] | - | - | - | - | - | - | - | - | 41.7 | - | - | - |
| Reason3D [23] | 88.4 | 84.2 | 74.6 | 50.5 | 31.7 | 34.1 | 57.9 | 41.9 | 42.0 | - | - | - |
| Single-task | ||||||||||||
| TGNN [24] | - | - | - | - | - | - | 37.5 | 31.4 | 27.8 | - | - | - |
| TGNN† [24] | 69.3 | 57.8 | 50.7 | 31.2 | 26.6 | 23.6 | 38.6 | 32.7 | 28.8 | 26862ms | 235ms | 27097ms |
| InstanceRefer† [71] | 81.6 | 72.2 | 60.4 | 29.4 | 23.5 | 21.5 | 40.2 | 33.5 | 30.6 | 509ms | 672ms | 1181ms |
| X-RefSeg3D [52] | - | - | - | - | - | - | 40.3 | 33.8 | 29.9 | - | - | - |
| 3DVG-Transformer∗ [73] | 79.5 | 58.0 | 49.9 | 42.0 | 30.8 | 27.0 | 49.3 | 36.1 | 31.4 | - | - | - |
| 3D-SPS∗ [45] | 84.8 | 65.6 | 54.7 | 41.7 | 30.8 | 26.7 | 50.1 | 37.6 | 32.1 | - | - | - |
| 3DRESTR [41] | 79.0 | 54.2 | - | 40.2 | 22.1 | - | 46.0 | 26.9 | 28.7 | - | - | - |
| 3D-STMN [63] | 89.3 | 84.0 | 74.5 | 46.2 | 29.2 | 31.1 | 54.6 | 39.8 | 39.5 | - | - | 283ms |
| RG-SAN (Ours) | 89.2 | 84.3 | 74.5 | 55.0 | 35.4 | 37.4 | 61.7 | 44.9 | 44.6 | - | - | 295ms |