notesum.ai

Published at December 4

cs.CV

cs.AI

cs.CL

Released Date: December 4, 2024

Authors: Yiwu Zhong¹, Zhuoming Liu², Yin Li², Liwei Wang¹

Aff.: ¹The Chinese University of Hong Kong; ²University of Wisconsin-Madison

Video LLMs
Model	FLOPs	Prefill Time	VideoMME	MVBench	MLVU	EgoSchema	NextQA	PerceptionTest
Model	(TB)	(ms)	wo / w-subs	test	m-avg	test	mc	val
VILA-40B [37]	-	-	60.1 / 61.1	-	-	58.0	67.9	54.0
PLLaVA-34B [77]	-	-	-	58.1	-	-	-	-
LLaVA-N-Video-32B [91]	-	-	60.2 / 63.0	-	65.5	60.9	77.3	59.4
IXC-2.5-7B [87]	-	-	55.8 / 58.8	69.1	37.3	-	71.0	34.4
LongVA-7B [88]	381.09	2186.04	52.6 / 54.3	-	56.3	-	68.3	-
LLaVA-OV-7B [30]	99.63	439.58	58.2 / 61.5	56.7	64.7	60.1	79.4	57.1
Training-free Method Applied during Inference
FastV [5]	21.24	79.56	55.9 / 60.0	55.9	61.1	57.5	77.5	56.3
LLaVA-Prumerge [59]	23.65	86.89	57.0 / 59.9	56.5	60.6	61.0	77.6	55.8
Ours	14.76	55.03	58.2 / 61.3	57.1	63.7	59.6	78.4	56.0