notesum.ai

Published at November 14

VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition

cs.CV

cs.AI

Released Date: November 14, 2024

Authors: Chenglin Li, Qianglong Chen, Zhi Li, Feng Tao, Yin Zhang

Arxiv: http://arxiv.org/abs/2411.09105v1

Refer to caption

Method	OP	AP		TR	SR		GP			FP	Avg.
	S1	S2	S3	S4	S5	S6	S7	S8	S9	S10
Random	33.2	34.0	37.1	30.3	32.7	23.9	25.0	28.2	37.6	33.9	31.6
MiniCPM-V	28.2	49.5	39.3	47.8	32.2	34.7	28.9	26.7	46.0	54.4	38.8
Video-LLaMA2	31.3	50.5	33.5	48.3	36.4	18.7	26.7	27.8	52.0	52.2	37.7
InternVideo2	31.3	50.5	33.5	48.3	36.4	18.7	26.7	27.8	52.0	52.2	37.7
Video-LLaVA	40.4	21.0	40.8	23.2	37.5	21.3	16.7	25.5	38.0	60.0	32.4
LLaVA-NEXT-Video-7B	20.4	22.5	30.7	21.0	33.8	18.7	12.2	14.4	15.3	46.7	23.6
LLaVA-NEXT-Video-34B	28.4	42.0	42.7	39.0	22.9	58.0	37.8	12.2	33.3	58.9	37.5
InternLM-XComposer-2.5	36.0	38.2	45.5	44.5	43.1	20.0	8.9	25.6	35.3	61.1	35.8
Qwen2-VL-2B	32.7	39.8	29.8	37.8	30.7	28.0	32.2	20.0	34.7	33.3	31.9
Qwen2-VL-7B	41.6	45.7	42.8	50.5	38.2	36.0	36.7	34.4	46.7	52.2	42.5
Qwen2-VL-72B	51.8	58.2	60.0	56.8	60.7	42.0	32.2	37.8	62.0	76.7	53.7