notesum.ai
Published at November 6DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
cs.CV
cs.AI
Released Date: November 6, 2024
Authors: Hao Phung1, Quan Dao2, Trung Dao3, Hoang Phan4, Dimitris Metaxas, Anh Tran3
Aff.: 1Cornell University; 2Rutgers University; 3VinAI Research; 4New York University
| Model | NFE | FID | Recall | Epochs |
| Ours | 61 | 4.62 | 0.52 | 225 |
| Zigma† [25] | 65 | 7.66 | 0.40 | 400 |
| LFM-8 [5] | 89 | 5.26 | 0.46 | 500 |
| LDM-4 [51] (ADM) | 500 | 5.11 | 0.49 | 600 |
| LDM-8 (ADM)‡ | 250 | 15.37 | - | 500 |
| LDM-8 (DiT)‡ | 250 | 10.21 | - | 500 |
| LSGM [60] | 23 | 7.22 | - | 1K |
| WaveDiff [49] | 2 | 5.94 | 0.37 | 500 |
| DDGAN [64] | 2 | 7.64 | 0.36 | 800 |
| RDUOT [6] | 2 | 5.60 | 0.38 | - |
| Score SDE [58] | 4000 | 7.23 | - | 6.2K |