notesum.ai
Published at November 22VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
cs.CV
cs.AI
cs.CL
Released Date: November 22, 2024
Authors: Daeun Lee1, Jaehong Yoon1, Jaemin Cho1, Mohit Bansal1
Aff.: 1UNC Chapel Hill
![[Uncaptioned image]](https://arxiv.org/html/2411.15115v1/x1.png)
| Method | Text-Video Alignment | Visual Quality | ||||
| Count | Color | Action | Others | Average | ||
| VideoCrafter2 | 47.52 | 46.28 | 44.07 | 46.20 | 46.02 | 61.89 |
| + LLM paraphrasing | 45.87 | 47.81 | 44.41 | 45.16 | 45.81 | 62.53 |
| + SLD [52] | 44.47 | 46.45 | 39.89 | 44.06 | 43.72 | 52.53 |
| + OPT2I [29] | 47.69 | 47.67 | 45.04 | 44.65 | 46.26 | 62.13 |
| + VideoRepair (Ours) | 48.67 | 48.80 | 45.47 | 46.35 | 47.32 | 62.15 |
| T2V-turbo | 46.14 | 45.05 | 41.42 | 43.16 | 43.94 | 63.54 |
| + LLM paraphrasing | 49.49 | 43.16 | 41.32 | 44.75 | 44.68 | 62.98 |
| + SLD [52] | 47.39 | 43.99 | 42.13 | 43.28 | 44.20 | 56.67 |
| + OPT2I [29] | 47.44 | 45.00 | 44.64 | 45.54 | 45.66 | 63.35 |
| + VideoRepair (Ours) | 52.51 | 51.17 | 45.79 | 46.11 | 48.89 | 63.28 |