notesum.ai
Published at December 9Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty
cs.AI
cs.CV
cs.LG
Released Date: December 9, 2024
Authors: Meera Hahn1, Wenjun Zeng2, Nithish Kannen3, Rich Galt4, Kartikeya Badola4, Been Kim2, Zi Wang5
Aff.: 1Google DeepMind, Atlanta, GA, USA; 2Google DeepMind, Seattle, WA, USA; 3Google DeepMind, Bangalore, India; 4Google DeepMind, London, UK; 5Google DeepMind, Cambridge, MA, USA

| Dataset | Model | T2T | I2I (DINO) | T2I (VQAScore) | NLL | DSG (T2T) |
|---|---|---|---|---|---|---|
| Coco-Captions | T2I | 0.8757 | 0.5170 | 0.2976 | 520.0645 | 0.5904 |
| Ag1 | 0.9440 | 0.6269 | 0.5831 | 508.4014 | 0.7555 | |
| Ag2 | 0.9461 | 0.6141 | 0.6632 | 481.7224 | 0.8344 | |
| Ag3 | 0.9501 | 0.6575 | 0.7751 | 446.5679 | 0.9001 | |
| ImageInWords | T2I | 0.8807 | 0.5154 | 0.3711 | 459.9053 | 0.6815 |
| Ag1 | 0.9429 | 0.5548 | 0.5058 | 449.8927 | 0.8162 | |
| Ag2 | 0.9382 | 0.5645 | 0.5701 | 444.5227 | 0.8791 | |
| Ag3 | 0.9418 | 0.5875 | 0.6624 | 429.4636 | 0.9124 | |
| DesignBench | T2I | 0.8740 | 0.5439 | 0.3528 | 320.8898 | 0.6074 |
| Ag1 | 0.9365 | 0.5943 | 0.6848 | 295.1974 | 0.8285 | |
| Ag2 | 0.9384 | 0.6417 | 0.8553 | 271.2604 | 0.9181 | |
| Ag3 | 0.9429 | 0.6924 | 0.9545 | 257.4352 | 0.9485 |