notesum.ai

Published at December 3

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

cs.LG

cs.AI

cs.CV

Released Date: December 3, 2024

Authors: Hiroki Furuta¹, Heiga Zen², Dale Schuurmans², Aleksandra Faust², Yutaka Matsuo¹, Percy Liang³, Sherry Yang²

Aff.: ¹University of Tokyo; ²Google DeepMind; ³Stanford University

Arxiv: http://arxiv.org/pdf/2412.02617v1

Refer to caption

	AI Eval			Human Eval
Method+Reward	Train	Test	All	Train	Test	All
Pre-Trained	53.02%	51.56%	52.66%	19.79%	18.13%	19.38%
SFT	55.94%	47.50%	53.83%	23.65%	13.44%	21.09%
RWR-CLIP	55.31%	45.00%	52.73%	27.19%	12.50%	23.52%
RWR-HPSv2	52.92%	57.50%	54.06%	26.04%	21.56%	24.92%
RWR-PS	55.52%	49.69%	54.06%	25.73%	11.56%	22.19%
RWR-OptFlow	57.81%	50.00%	55.86%	28.75%	10.00%	24.06%
RWR-AIF	58.23%	50.94%	56.41%	33.65%	23.44%	31.09%
DPO-CLIP	52.29%	54.38%	52.81%	24.90%	23.44%	24.53%
DPO-HPSv2	55.73%	51.56%	54.69%	26.35%	25.00%	26.02%
DPO-PS	53.02%	55.31%	53.59%	25.94%	22.50%	25.08%
DPO-OptFlow	54.06%	54.06%	54.06%	26.88%	24.06%	26.17%
DPO-AIF	56.56%	55.00%	56.17%	36.04%	28.13%	34.06%