notesum.ai
Published at November 29JetFormer: An Autoregressive Generative Model of Raw Images and Text
cs.LG
cs.AI
cs.CV
Released Date: November 29, 2024
Authors: Michael Tschannen1, André Susano Pinto, Alexander Kolesnikov1
Aff.: 1Google DeepMind

| extra step | FID | Precision | Recall | NLL | |
| BigGAN-deep (Brock et al., 2018) | – | 6.95 | 0.87 | 0.28 | |
| ADM-G (Dhariwal & Nichol, 2021) | – | 4.59 | 0.82 | 0.52 | |
| LDM-4-G (Rombach et al., 2022) | VAE | 3.60 | 0.87 | 0.48 | |
| VQGAN (Esser et al., 2020) | VQ-VAE | 5.20 | |||
| ViT-VQGAN (Yu et al., 2022a) | VQ-VAE | 3.04 | |||
| GIVT-Causal (Tschannen et al., 2024) | VAE | 3.35 | 0.84 | 0.53 | |
| JetFormer-B | – | 7.25 | 0.72 | 0.44 | 3.06 |
| JetFormer-L | – | 6.64 | 0.69 | 0.56 | 3.05 |