notesum.ai
Published at December 93D Spatial Understanding in MLLMs: Disambiguation and Evaluation
cs.CV
Released Date: December 9, 2024
Authors: Chun-Peng Chang1, Alain Pagani, Didier Stricker
Aff.: 1German Research Center for Artificial Intelligence

| Dataset | Method | B-1 | B-2 | B-3 | B-4 | CIDEr | ROUGE-L |
|---|---|---|---|---|---|---|---|
| Vote2Cap[11] | 0.40 | 0.28 | 0.18 | 0.12 | 0.27 | 0.30 | |
| Nr3D | Vote2Cap++[12] | 0.41 | 0.32 | 0.24 | 0.18 | 0.29 | 0.32 |
| Ours | 0.57 | 0.38 | 0.24 | 0.15 | 0.26 | 0.43 | |
| Vote2Cap[11] | 0.49 | 0.45 | 0.41 | 0.38 | 2.24 | 0.46 | |
| Vote2Cap++[12] | 0.52 | 0.49 | 0.45 | 0.42 | 2.47 | 0.49 | |
| Sr3D | Ours | 0.60 | 0.54 | 0.49 | 0.44 | 2.54 | 0.57 |
| Ours + ”far” | 0.60 | 0.52 | 0.45 | 0.38 | 2.07 | 0.56 | |
| Ours + ”close” | 0.58 | 0.49 | 0.39 | 0.31 | 1.59 | 0.54 |