notesum.ai
Published at December 3Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
cs.LG
cs.AI
cs.CV
Released Date: December 3, 2024
Authors: Hiroki Furuta1, Heiga Zen2, Dale Schuurmans2, Aleksandra Faust2, Yutaka Matsuo1, Percy Liang3, Sherry Yang2
Aff.: 1University of Tokyo; 2Google DeepMind; 3Stanford University

| AI Eval | Human Eval | |||||
| Method+Reward | Train | Test | All | Train | Test | All |
| Pre-Trained | 53.02% | 51.56% | 52.66% | 19.79% | 18.13% | 19.38% |
| SFT | 55.94% | 47.50% | 53.83% | 23.65% | 13.44% | 21.09% |
| RWR-CLIP | 55.31% | 45.00% | 52.73% | 27.19% | 12.50% | 23.52% |
| RWR-HPSv2 | 52.92% | 57.50% | 54.06% | 26.04% | 21.56% | 24.92% |
| RWR-PS | 55.52% | 49.69% | 54.06% | 25.73% | 11.56% | 22.19% |
| RWR-OptFlow | 57.81% | 50.00% | 55.86% | 28.75% | 10.00% | 24.06% |
| RWR-AIF | 58.23% | 50.94% | 56.41% | 33.65% | 23.44% | 31.09% |
| DPO-CLIP | 52.29% | 54.38% | 52.81% | 24.90% | 23.44% | 24.53% |
| DPO-HPSv2 | 55.73% | 51.56% | 54.69% | 26.35% | 25.00% | 26.02% |
| DPO-PS | 53.02% | 55.31% | 53.59% | 25.94% | 22.50% | 25.08% |
| DPO-OptFlow | 54.06% | 54.06% | 54.06% | 26.88% | 24.06% | 26.17% |
| DPO-AIF | 56.56% | 55.00% | 56.17% | 36.04% | 28.13% | 34.06% |