notesum.ai
Published at December 5Instructional Video Generation
cs.CV
Released Date: December 5, 2024
Authors: Yayuan Li1, Zhi Cao1, Jason J. Corso2
Aff.: 1University of Michigan; 2University of Michigan and Voxel51
![[Uncaptioned image]](https://arxiv.org/html/2412.04189v1/x1.png)
| Dataset | Method | HS-Err. | GT-Frame | GT-Video | Consistency | Semantic | |||
|---|---|---|---|---|---|---|---|---|---|
| FID | CLIP | FVD | EgoVLP | CLIP | CLIP | BLIP | |||
| EpicKitchens | LFDM [54] | 0.01987 | 39.37 | 92.414 | 129.80 | 0.354 | 0.9826 | 28.37 | 0.235 |
| AA [17] | 0.01908 | 5.49 | 95.882 | 171.29 | 0.338 | 0.9843 | 29.97 | 0.295 | |
| AVDC [90] | 0.01969 | 140.34 | 89.176 | 81.39 | 0.197 | 0.9582 | 24.66 | 0.116 | |
| PIA [93] | 0.01826 | 24.70 | 94.455 | 212.88 | 0.361 | 0.9849 | 30.06 | 0.294 | |
| Ours | 0.01512 | 5.27 | 95.904 | 101.89 | 0.377 | 0.9896 | 31.14 | 0.298 | |
| Ego4D | LFDM [54] | 0.02127 | 50.67 | 92.037 | 126.71 | 0.535 | 0.9821 | 26.93 | 0.221 |
| AA [17] | 0.02393 | 21.83 | 96.472 | 129.60 | 0.642 | 0.9894 | 28.56 | 0.260 | |
| AVDC [90] | 0.02117 | 144.91 | 88.160 | 107.82 | 0.261 | 0.9722 | 24.17 | 0.155 | |
| PIA [93] | 0.02393 | 34.62 | 94.574 | 104.38 | 0.603 | 0.9746 | 29.15 | 0.219 | |
| Ours | 0.01939 | 21.51 | 96.506 | 103.15 | 0.664 | 0.9873 | 28.63 | 0.263 | |
| Motion Intensive | LFDM [54] | 0.02053 | 56.95 | 92.540 | 137.44 | 0.303 | 0.9825 | 28.39 | 0.210 |
| AA [17] | 0.01764 | 23.93 | 95.909 | 115.14 | 0.368 | 0.9845 | 30.07 | 0.276 | |
| AVDC [90] | 0.02143 | 148.11 | 89.327 | 85.97 | 0.204 | 0.9579 | 24.60 | 0.102 | |
| PIA [93] | 0.01940 | 40.97 | 94.482 | 217.59 | 0.330 | 0.9719 | 30.09 | 0.280 | |
| Ours | 0.01663 | 23.79 | 95.885 | 114.52 | 0.371 | 0.9849 | 31.12 | 0.327 | |