notesum.ai
Published at November 25Factorized Visual Tokenization and Generation
cs.CV
Released Date: November 25, 2024
Authors: Zechen Bai1, Jianxiong Gao2, Ziteng Gao1, Pichao Wang3, Zheng Zhang3, Tong He3, Mike Zheng Shou1
Aff.: 1Show Lab, National University of Singapore; 2Fudan University; 3Amazon

| Method | Downsample | Codebook | Code | rFID | PSNR |
| Ratio | Size | Dim | |||
| VQGAN [6] | 16 | 16384 | 256 | 4.98 | |
| SD-VQGAN [23] | 16 | 16384 | 4 | 5.15 | |
| RQ-VAE [12] | 16 | 16384 | 256 | 3.20 | |
| LlamaGen [25] | 16 | 16384 | 8 | 2.19 | 20.79 |
| Titok-B [36] | 4096 | 12 | 1.70 | ||
| VQGAN-LC [41] | 16 | 100000 | 8 | 2.62 | 23.80 |
| VQ-KD [30] | 16 | 8192 | 32 | 3.41 | - |
| VILA-U [31] | 16 | 16384 | 256 | 1.80 | - |
| Open-MAGVIT2 [15] | 16 | 262144 | 1 | 1.17 | 21.90 |
| FQGAN-Dual | 16 | 16384 2 | 8 | 0.94 | 22.02 |
| FQGAN-Triple | 16 | 16384 3 | 8 | 0.76 | 22.73 |
| SD-VAE† [23] | 8 | 4 | 0.74 | 25.68 | |
| SDXL-VAE† [19] | 8 | 4 | 0.68 | 26.04 | |
| ViT-VQGAN [33] | 8 | 8192 | 32 | 1.28 | |
| VQGAN∗ [6] | 8 | 16384 | 4 | 1.19 | 23.38 |
| SD-VQGAN∗ [23] | 8 | 16384 | 4 | 1.14 | |
| OmniTokenizer [29] | 8 | 8192 | 8 | 1.11 | |
| LlamaGen [25] | 8 | 16384 | 8 | 0.59 | 25.45 |
| Open-MAGVIT2 [15] | 8 | 262144 | 1 | 0.34 | 26.19 |
| FQGAN-Dual | 8 | 16384 2 | 8 | 0.32 | 26.27 |
| FQGAN-Triple | 8 | 16384 3 | 8 | 0.24 | 27.58 |