notesum.ai
Published at December 10Video Motion Transfer with Diffusion Transformers
cs.CV
cs.AI
cs.LG
Released Date: December 10, 2024
Authors: Alexander Pondaven1, Aliaksandr Siarohin2, Sergey Tulyakov2, Philip Torr1, Fabio Pizzati3
Aff.: 1University of Oxford; 2Snap Inc.; 3MBZUAI
![[Uncaptioned image]](https://arxiv.org/html/2412.07776v1/x1.png)
| CogVideoX-5B | CogVideoX-2B | |||||||||||||||
| Method | Caption | Subject | Scene | All | Caption | Subject | Scene | All | ||||||||
| MF | IQ | MF | IQ | MF | IQ | MF | IQ | MF | IQ | MF | IQ | MF | IQ | MF | IQ | |
| Backbone | 0.524 | 0.315 | 0.502 | 0.321 | 0.544 | 0.318 | 0.523 | 0.318 | 0.521 | 0.313 | 0.495 | 0.312 | 0.523 | 0.314 | 0.513 | 0.313 |
| Injection [59] | 0.608 | 0.315 | 0.581 | 0.321 | 0.635 | 0.320 | 0.608 | 0.319 | 0.546 | 0.315 | 0.524 | 0.317 | 0.563 | 0.321 | 0.544 | 0.318 |
| SMM [62] | 0.782 | 0.313 | 0.741 | 0.317 | 0.776 | 0.316 | 0.766 | 0.315 | 0.687 | 0.312 | 0.682 | 0.312 | 0.694 | 0.317 | 0.688 | 0.312 |
| MOFT [59] | 0.728 | 0.313 | 0.728 | 0.321 | 0.722 | 0.319 | 0.726 | 0.318 | 0.503 | 0.312 | 0.502 | 0.313 | 0.508 | 0.315 | 0.504 | 0.312 |
| DiTFlow | 0.790 | 0.316 | 0.775 | 0.321 | 0.789 | 0.319 | 0.785 | 0.319 | 0.685 | 0.311 | 0.753 | 0.322 | 0.739 | 0.320 | 0.726 | 0.317 |