notesum.ai
Published at December 9[MASK] is All You Need
cs.CV
cs.AI
Released Date: December 9, 2024
Authors: Vincent Tao Hu1, Björn Ommer
Aff.: 1CompVis @ LMU Munich

| Model | FID | Type | Training datasets | #Params |
| Generative model trained on external large dataset (zero-shot) | ||||
| LAFITE [89] | 26.94 | GAN | CC3M (3M) | 75M + 151M (TE) |
| Parti [81] | 7.23 | Autoregressive | LAION (400M) + FIT (400M) + JFT (4B) | 20B + 630M (AE) |
| Re-Imagen [12] | 6.88 | Continous Diffusion | KNN-ImageText (50M) | 2.5B + 750M (SR) |
| Generative model trained on external large dataset with access to MS-COCO | ||||
| Re-Imagen‡ [12] | 5.25 | Diffusion | KNN-ImageText (50M) | 2.5B + 750M (SR) |
| Make-A-Scene [21] | 7.55 | Autoregressive | Union datasets (with MS-COCO) (35M) | 4B |
| VQ-Diffusion† [26] | 13.86 | Discrete diffusion | Conceptual Caption Subset (7M) | 370M |
| Generative model trained on MS-COCO | ||||
| U-Net | 7.32 | Continuous diffusion | MS-COCO (83K) | 53M + 123M (TE) + 84M (AE) |
| U-ViT [6] | 5.48 | Continuous diffusion | MS-COCO (83K) | 58M + 123M (TE) + 84M (AE) |
| VQ-Diffusion [26] | 19.75 | Discrete Diffusion | MS-COCO (83K) | 370M |
| Implicit Timestep Model (Our,w/ 20-steps) | 8.11 | Discrete Diffusion & MGM | MS-COCO (83K) | 77M + 123M (TE) + 84M (AE) |
| Implicit Timestep Model (Our) | 5.65 | Discrete Diffusion & MGM | MS-COCO (83K) | 77M + 123M (TE) + 84M (AE) |
| Explicit Timestep Model (Our) | 6.03 | Discrete Diffusion & MGM | MS-COCO (83K) | 77M + 123M (TE) + 84M (AE) |