notesum.ai
Published at November 19PoM: Efficient Image and Video Generation with the Polynomial Mixer
cs.CV
cs.AI
Released Date: November 19, 2024
Authors: David Picard1, Nicolas Dufour2
Aff.: 1LIGM, Ecole Nationale des Ponts et Chaussées, IP Paris, Univ Gustave Eiffel, CNRS, France; 2LIGM, Ecole Nationale des Ponts et Chaussées, IP Paris, Univ Gustave Eiffel, CNRS, France; LIX, Ecole Polytechnique, IP Paris, CNRS, France
![[Uncaptioned image]](https://arxiv.org/html/2411.12663v1/extracted/6010135/images/green_apple.png)
| Model | Sample config | #train | FID | IS | Precision | Recall |
|---|---|---|---|---|---|---|
| Mask-GIT [8] | 6.18 | 182.1 | 0.80 | 0.51 | ||
| DIFFUSSM-XL† [79] | 250 steps DDPM | 660M | 2.28 | 259.1 | 0.86 | 0.56 |
| DiM-H† [69] | 25 steps DPM++ | 480M | 2.21 | - | - | - |
| ADM-G [16] | 250 steps DDIM | 500M | 4.59 | 186.7 | 0.83 | 0.53 |
| LDM-4-G [63] | 250 steps DDIM | 215M | 3.60 | 247.7 | 0.87 | 0.48 |
| RIN [40] | 1000 steps DDPM | 600M | 3.42 | 182.0 | - | - |
| DiT-XL/2† [58] | 250 steps DDPM | 1.8B | 2.27 | 278.2 | 0.83 | 0.57 |
| SiT-XL/2† [56] | 125 steps Heun | 1.8B | 2.15 | 254.9 | 0.81 | 0.60 |
| DiPoM-XL/2 (ours) | 250 steps DDIM | 950M | 2.46 | 240.6 | 0.78 | 0.60 |
| DiPoM-XL/2 (ours) | 125 steps Heun | 950M | 3.70 | 255.2 | 0.79 | 0.56 |