notesum.ai
Published at November 29Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling
cs.CV
cs.LG
Released Date: November 29, 2024
Authors: Qirui Wu1, Denys Iliash1, Daniel Ritchie2, Manolis Savva1, Angel X. Chang3
Aff.: 1Simon Fraser University; 2Brown University; 3Simon Fraser University, Alberta Machine Intelligence Institute (Amii)

| Ground Truth Depth | Estimated Depth | ||||||||||||
| Zero-shot pose estimation | Scene-aware Alignment | Collision | Relation | Scene-aware Alignment | Collision | Relation | |||||||
| rAcc | tAcc | sAcc | Acc | rAcc | tAcc | sAcc | Acc | ||||||
| Best-matching multiview | 0.34 | 0.93 | 0.52 | 0.19 | 11.43 | 0.58 | 0.34 | 0.84 | 0.44 | 0.15 | 12.29 | 0.25 | |
| ZSP [30] w/ DinoV2 | 0.37 | 0.93 | 0.57 | 0.23 | 10.48 | 0.59 | 0.42 | 0.85 | 0.55 | 0.26 | 8.29 | 0.25 | |
| ZSP [30] w/ ft DinoV2 | 0.45 | 0.94 | 0.66 | 0.34 | 7.90 | 0.61 | 0.41 | 0.85 | 0.54 | 0.24 | 8.46 | 0.24 | |
| GigaPose [63] | 0.36 | 0.95 | 0.71 | 0.27 | 7.91 | 0.61 | 0.36 | 0.84 | 0.64 | 0.22 | 7.44 | 0.22 | |
| Ours | 0.45 | 0.95 | 0.71 | 0.33 | 7.26 | 0.61 | 0.41 | 0.84 | 0.64 | 0.25 | 7.37 | 0.25 | |