notesum.ai

Published at November 27

VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis

cs.CV

cs.AI

Released Date: November 27, 2024

Authors: Donggoo Kang¹, Dasol Jeong¹, Hyunmin Lee¹, Sangwoo Park¹, Hasil Park¹, Sunkyu Kwon¹, Yeongjoon Kim¹, Joonki Paik¹

Aff.: ¹Chung-Ang University, Seoul, Korea

Arxiv: http://arxiv.org/abs/2411.18038v1

Refer to caption

Method	Default			Known Object
Method	Full	Rare	Non-Rare	Full	Rare	Non-Rare
IDN[24]	24.58	20.33	25.86	27.89	23.64	29.16
HOTR[16]	25.10	17.34	27.42	-	-	-
HOI-Trans[57]	26.61	19.15	28.84	29.13	20.98	31.57
QPIC[37]	29.07	21.85	31.23	31.68	24.14	33.93
MSTR[17]	31.17	25.31	32.92	34.02	28.83	35.57
CDN[49]	32.07	27.19	33.53	34.79	29.48	36.38
STIP[53]	32.22	28.15	33.43	35.29	31.43	36.45
UPT[51]	32.62	28.62	33.81	36.08	31.41	37.47
MUREN[18]	32.87	28.67	34.12	35.52	30.88	36.91
GEN-VLKT[26]	33.75	29.25	35.10	36.78	32.75	37.99
VLM-HOI	34.25	30.22	35.20	36.88	33.30	37.75