notesum.ai
Published at December 9LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
cs.CV
Released Date: December 9, 2024
Authors: Mingjie Xu1, Mengyang Wu2, Yuzhi Zhao3, Jason Chun Lok Li4, Weifeng Ou5
Aff.: 1Independent Researcher; 2The Chinese University of Hong Kong; 3City University of Hong Kong; 4The University of Hong Kong; 5Dongguan University of Technology

| Model | Recall | mRecall |
|---|---|---|
| Close-ended SGG | ||
| IMP | 16.5 | 6.5 |
| MOTIFS | 20.0 | 9.1 |
| VCTree | 20.6 | 9.7 |
| GPSNet | 17.8 | 7.0 |
| PSGFormer | 18.6 | 16.7 |
| Open-ended SGG | ||
| TextPSG | 4.8 | – |
| ASMv2 | 14.2 | 10.3 |
| LLaVA-SpaceSGG | 15.43 | 13.23 |