notesum.ai
Published at December 4DIVE: Taming DINO for Subject-Driven Video Editing
cs.CV
cs.AI
Released Date: December 4, 2024
Authors: Yi Huang1, Wei Xiong2, He Zhang2, Chaoqi Chen3, Jianzhuang Liu1, Mingfu Yan4, Shifeng Chen5
Aff.: 1Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences; 2Adobe Research; 3Shenzhen University; 4University of Chinese Academy of Sciences; 5Shenzhen University of Advanced Technology
![[Uncaptioned image]](https://arxiv.org/html/2412.03347v1/x1.png)
| Methods | Text Alignment | Image Alignment | Temporal Consistency | User Study |
| Reference Image Guided Subject Editing | ||||
| TokenFlow [12] | 27.76 | 60.39 | 90.18 | 5.75 |
| AnyV2V [28] | 28.13 | 78.26 | 90.52 | 16.67 |
| FLATTEN [6] | 28.79 | 69.32 | 92.09 | 5.25 |
| RAVE [25] | 28.26 | 66.25 | 91.71 | 7.25 |
| DIVE (Ours) | 29.43 | 84.27 | 92.33 | 65.08% |
| Text Guided Subject Editing | ||||
| TokenFlow [12] | 31.87 | 94.21 | 17.14 | |
| AnyV2V [28] | 31.05 | 93.73 | 5.63 | |
| FLATTEN [6] | 31.55 | 95.35 | 14.74 | |
| RAVE [25] | 31.57 | 95.12 | 10.22 | |
| DIVE (Ours) | 32.29 | 95.89 | 52.27% | |