notesum.ai
Published at November 16ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
cs.CV
cs.AI
Released Date: November 16, 2024
Authors: Vipula Rawte1, Sarthak Jain2, Aarush Sinha3, Garv Kaushik4, Aman Bansal5, Prathiksha Rumale Vishwanath5, Samyak Rajesh Jain6, Aishwarya Naresh Reganti7, Vinija Jain8, Aman Chadha9, Amit P. Sheth, Amitava Das1
Aff.: 1AI Institute, University of South Carolina, USA; 2Guru Gobind Singh Indraprastha University, India; 3Vellore Institute of Technology, India; 4Indian Institute of Technology (BHU), India; 5University of Massachusetts Amherst, USA; 6University of California, Santa Cruz, USA; 7Amazon Web Services, USA; 8Meta, USA; 9Amazon GenAI, USA
![[Uncaptioned image]](https://arxiv.org/html/2411.10867v1/x1.png)
| T2V Model | VS | NV | TD | OE | PI | Total |
|---|---|---|---|---|---|---|
| AnimateLCM [29] | 2 | 70 | 70 | 70 | 70 | 282 |
| zeroscope_v2_XL [25] | 18 | 0 | 37 | 109 | 199 | 363 |
| Show1 [35] | 13 | 71 | 88 | 111 | 55 | 338 |
| MORA [34] | 82 | 96 | 99 | 202 | 215 | 694 |
| AnimateDiff Lightning [13] | 11 | 33 | 52 | 56 | 63 | 215 |
| AnimateDiff-MotionAdapter [9] | 28 | 59 | 158 | 182 | 94 | 521 |
| MagicTime [33] | 70 | 70 | 70 | 69 | 70 | 349 |
| zeroscope_v2_576w [24] | 17 | 0 | 41 | 115 | 187 | 360 |
| MS1.7B [1] | 51 | 50 | 70 | 70 | 70 | 311 |
| HotShotXL [19] | 70 | 70 | 70 | 69 | 70 | 349 |
| Total | 362 | 519 | 755 | 1053 | 1093 | 3782 |