notesum.ai
Published at November 20XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
cs.CV
cs.AI
Released Date: November 20, 2024
Authors: Ziyi Wang1, Yanbo Wang1, Xumin Yu1, Jie Zhou1, Jiwen Lu1
Aff.: 1Department of Automation, Tsinghua University, China

| Method | Scannet | ScanNet200 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B15/N4 | B12/N7 | B10/N9 | B170/N30 | B150/N50 | |||||||||||
| hIoU | Base | Novel | hIoU | Base | Novel | hIoU | Base | Novel | hIoU | Base | Novel | hIoU | Base | Novel | |
| LSeg-3D [23] | 0.0 | 64.4 | 0.0 | 0.9 | 55.7 | 0.1 | 1.8 | 68.4 | 0.9 | 1.5 | 21.1 | 0.8 | 3.0 | 20.6 | 1.6 |
| 3DGenZ [31] | 20.6 | 56.0 | 12.6 | 19.8 | 35.5 | 13.3 | 12.0 | 63.6 | 6.6 | 2.6 | 15.8 | 1.4 | 3.3 | 14.1 | 1.9 |
| 3DTZSL [5] | 10.5 | 36.7 | 6.1 | 3.8 | 36.6 | 2.0 | 7.8 | 55.5 | 4.2 | 0.9 | 4.0 | 0.5 | 0.7 | 3.8 | 0.4 |
| PLA [11] | 65.3 | 68.3 | 62.4 | 55.3 | 69.5 | 45.9 | 53.1 | 76.2 | 40.8 | 11.4 | 20.9 | 7.8 | 10.1 | 20.9 | 6.6 |
| OpenScene [34] | 65.7 | 68.8 | 62.8 | 56.8 | 61.5 | 51.7 | 54.3 | 71.8 | 43.6 | 14.2 | 22.5 | 10.4 | 15.2 | 23.5 | 11.2 |
| OV3D [20] | 72.4 | 70.2 | 74.7 | 68.5 | 74.1 | 63.7 | 64.8 | 77.6 | 55.6 | – | – | – | – | – | – |
| XMask3D | 70.0 | 69.8 | 70.2 | 61.7 | 70.2 | 55.1 | 55.7 | 76.5 | 43.8 | 18.0 | 27.8 | 13.3 | 15.5 | 24.4 | 11.4 |