notesum.ai

Published at November 16

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

cs.CV

cs.AI

Released Date: November 16, 2024

Authors: Vipula Rawte¹, Sarthak Jain², Aarush Sinha³, Garv Kaushik⁴, Aman Bansal⁵, Prathiksha Rumale Vishwanath⁵, Samyak Rajesh Jain⁶, Aishwarya Naresh Reganti⁷, Vinija Jain⁸, Aman Chadha⁹, Amit P. Sheth, Amitava Das¹

Aff.: ¹AI Institute, University of South Carolina, USA; ²Guru Gobind Singh Indraprastha University, India; ³Vellore Institute of Technology, India; ⁴Indian Institute of Technology (BHU), India; ⁵University of Massachusetts Amherst, USA; ⁶University of California, Santa Cruz, USA; ⁷Amazon Web Services, USA; ⁸Meta, USA; ⁹Amazon GenAI, USA

Arxiv: http://arxiv.org/abs/2411.10867v1

T2V Model	VS	NV	TD	OE	PI	Total
AnimateLCM [29]	2	70	70	70	70	282
zeroscope_v2_XL [25]	18	0	37	109	199	363
Show1 [35]	13	71	88	111	55	338
MORA [34]	82	96	99	202	215	694
AnimateDiff Lightning [13]	11	33	52	56	63	215
AnimateDiff-MotionAdapter [9]	28	59	158	182	94	521
MagicTime [33]	70	70	70	69	70	349
zeroscope_v2_576w [24]	17	0	41	115	187	360
MS1.7B [1]	51	50	70	70	70	311
HotShotXL [19]	70	70	70	69	70	349
Total	362	519	755	1053	1093	3782