notesum.ai

Published at December 9

LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations

cs.CV

Released Date: December 9, 2024

Authors: Mingjie Xu¹, Mengyang Wu², Yuzhi Zhao³, Jason Chun Lok Li⁴, Weifeng Ou⁵

Aff.: ¹Independent Researcher; ²The Chinese University of Hong Kong; ³City University of Hong Kong; ⁴The University of Hong Kong; ⁵Dongguan University of Technology

Arxiv: http://arxiv.org/pdf/2412.06322v1

Model	Recall	mRecall
Close-ended SGG
IMP	16.5	6.5
MOTIFS	20.0	9.1
VCTree	20.6	9.7
GPSNet	17.8	7.0
PSGFormer	18.6	16.7
Open-ended SGG
TextPSG	4.8	–
ASMv2	14.2	10.3
LLaVA-SpaceSGG	15.43	13.23