notesum.ai

Published at December 9

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

cs.AI

cs.CV

cs.LG

Released Date: December 9, 2024

Authors: Meera Hahn¹, Wenjun Zeng², Nithish Kannen³, Rich Galt⁴, Kartikeya Badola⁴, Been Kim², Zi Wang⁵

Aff.: ¹Google DeepMind, Atlanta, GA, USA; ²Google DeepMind, Seattle, WA, USA; ³Google DeepMind, Bangalore, India; ⁴Google DeepMind, London, UK; ⁵Google DeepMind, Cambridge, MA, USA

Arxiv: http://arxiv.org/pdf/2412.06771v1

Dataset	Model	T2T $\uparrow$	I2I (DINO) $\uparrow$	T2I (VQAScore) $\uparrow$	NLL $\downarrow$	DSG (T2T) $\uparrow$
Coco-Captions	T2I	0.8757 $\pm.03$	0.5170 $\pm.16$	0.2976 $\pm.45$	520.0645 $\pm 161.3$	0.5904 $\pm.05$
	Ag1	0.9440 $\pm.02$	0.6269 $\pm.12$	0.5831 $\pm.49$	508.4014 $\pm 158.5$	0.7555 $\pm.08$
	Ag2	0.9461 $\pm.02$	0.6141 $\pm.13$	0.6632 $\pm.46$	481.7224 $\pm 154.5$	0.8344 $\pm.08$
	Ag3	0.9501 $\pm.02$	0.6575 $\pm.10$	0.7751 $\pm.39$	446.5679 $\pm 151.8$	0.9001 $\pm.05$
ImageInWords	T2I	0.8807 $\pm.02$	0.5154 $\pm.15$	0.3711 $\pm.47$	459.9053 $\pm 200.2$	0.6815 $\pm.70$
	Ag1	0.9429 $\pm.02$	0.5548 $\pm.15$	0.5058 $\pm.48$	449.8927 $\pm 196.1$	0.8162 $\pm.08$
	Ag2	0.9382 $\pm.02$	0.5645 $\pm.15$	0.5701 $\pm.48$	444.5227 $\pm 193.7$	0.8791 $\pm.07$
	Ag3	0.9418 $\pm.02$	0.5875 $\pm.14$	0.6624 $\pm.45$	429.4636 $\pm 194.5$	0.9124 $\pm.06$
DesignBench	T2I	0.8740 $\pm.02$	0.5439 $\pm.12$	0.3528 $\pm.48$	320.8898 $\pm 93.7$	0.6074 $\pm.08$
	Ag1	0.9365 $\pm.02$	0.5943 $\pm.12$	0.6848 $\pm.46$	295.1974 $\pm 69.2$	0.8285 $\pm.08$
	Ag2	0.9384 $\pm.02$	0.6417 $\pm.11$	0.8553 $\pm.34$	271.2604 $\pm 81.9$	0.9181 $\pm.06$
	Ag3	0.9429 $\pm.02$	0.6924 $\pm.12$	0.9545 $\pm.21$	257.4352 $\pm 67.5$	0.9485 $\pm.04$