notesum.ai
Published at November 5On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
cs.CV
cs.AI
Released Date: November 5, 2024
Authors: Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane Askari-Hemmat1, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal1
Aff.: 1FAIR at Meta

| ImageNet-1k | CC12M | ||||||
| 256 | 512 | 256 | 512 | ||||
| FID | FID | FID | CLIP | FID | FID | CLIP | |
| Results taken from references | |||||||
| UNet (SD/LDM-G4) [39] | — | 24 | — | — | |||
| DiT-XL2 w/ LN [38] | — | — | — | — | — | ||
| mDT-v2-XL/2 w/ LN [15] | — | — | — | — | — | — | |
| PixArt--XL/2 [7] | — | — | — | — | — | — | |
| mmDiT-XL/2 (SD3) [14] | — | — | — | 22.4 | — | * | — |
| Our re-implementation of existing architectures | |||||||
| UNet (SDXL) | |||||||
| DiT-XL/2 w/ LN | — | — | — | — | — | ||
| DiT-XL/2 w/ Att | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | |
| mDT-v2-XL/2 w/ LN | — | — | — | — | — | ||
| PixArt--XL/2 | ✗ | ✗ | ✗ | ✗ | ✗ | ||
| mmDiT-XL/2 (SD3) | |||||||
| Our improved architecture and training | |||||||
| mmDiT-XL/2 (ours) | |||||||