notesum.ai
Published at November 29PerLA: Perceptive 3D Language Assistant
cs.CV
cs.CL
cs.LG
Released Date: November 29, 2024
Authors: Guofeng Mei1, Wei Lin2, Luigi Riz1, Yujiao Wu3, Fabio Poiesi1, Yiming Wang1
Aff.: 1Fondazione Bruno Kessler, Italy; 2JKU Linz, Austria; 3CSIRO, Australia
![[Uncaptioned image]](https://arxiv.org/html/2411.19774v1/x1.png)
| Method | ScanRefer@0.25 | ScanRefer@0.5 | Nr3D@0.5 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C | B4 | M | R | C | B4 | M | R | C | B4 | M | R | |
| Scan2Cap[12] | 56.82 | 34.18 | 26.29 | 55.27 | 39.08 | 23.32 | 21.97 | 44.78 | 27.47 | 17.24 | 21.80 | 49.06 |
| MORE[24] | 62.91 | 36.25 | 26.75 | 56.33 | 40.94 | 22.93 | 21.66 | 44.42 | - | - | - | - |
| SpaCap3D[53] | - | - | - | - | - | 44.02 | 25.26 | 22.33 | 33.71 | 19.92 | 22.61 | 50.50 |
| REMAN[37] | 62.01 | 36.37 | 26.76 | 56.25 | 45.00 | 26.31 | 23.13 | 46.96 | 34.81 | 20.37 | 22.71 | 50.90 |
| D3Net[7] | - | - | - | - | - | 51.67 | - | - | 35.26 | 20.42 | 22.77 | 53.38 |
| Contextual[64] | - | - | - | - | - | 46.07 | 23.40 | 23.95 | - | - | - | - |
| UniT3D[13] | - | - | - | - | 46.69 | 27.52 | 21.91 | 45.98 | - | - | - | - |
| 3DJCG[4] | 64.70 | 40.17 | 27.63 | 59.23 | 49.48 | 31.63 | 24.36 | 50.80 | 38.06 | 22.82 | 23.77 | 52.99 |
| 3D-VLP[25] | 70.73 | 41.03 | 28.14 | 59.72 | 54.94 | 32.31 | 24.83 | 51.51 | - | - | - | - |
| 3D-VisTA*[66] | - | - | - | - | 61.60 | 34.10 | 26.80 | 55.00 | - | - | - | - |
| Vote2CapDETR[8] | 71.45 | 39.34 | 28.25 | 59.63 | 61.81 | 34.46 | 26.22 | 54.40 | 43.84 | 26.68 | 25.41 | 54.43 |
| LL3DA[9] | 74.17 | 41.41 | 27.76 | 59.53 | 65.19 | 36.79 | 25.97 | 55.06 | 51.18 | 28.75 | 25.91 | 56.61 |
| LL3DA (repr.) | 71.86 | 39.57 | 27.29 | 58.37 | 63.79 | 35.67 | 25.94 | 54.56 | 48.38 | 28.36 | 25.72 | 55.66 |
| PerLA | 77.92 | 43.41 | 28.97 | 59.69 | 69.41 | 38.02 | 29.07 | 56.80 | 55.06 | 31.24 | 28.52 | 59.13 |
| w.r.t. LL3DA[9] | +3.75 | +2.00 | +1.21 | +0.16 | +4.22 | +1.23 | +2.27 | +1.74 | +3.88 | +2.49 | +2.61 | +2.52 |