notesum.ai
Published at November 4Adaptive Length Image Tokenization via Recurrent Allocation
cs.CV
cs.AI
cs.LG
cs.RO
Released Date: November 4, 2024
Authors: Shivam Duggal1, Phillip Isola1, Antonio Torralba1, William T. Freeman1
Aff.: 1MIT CSAIL

| Approach | ImageNet100 | COCO | Wikipedia (WIT) | |||||||||||
| 32 | 64 | 96 | 128 | 160 | 192 | 224 | 256 | / 64 | 128 | 256 | / 64 | 128 | 256 | |
| Titok-L-32 | 11.60 | - | - | - | - | - | - | - | - | - | - | - | ||
| Titok-B-64 | - | 8.22 | - | - | - | - | - | - | 9.15 | - | - | 42.86 | - | - |
| Titok-S-128 | - | - | - | 8.22 | - | - | - | - | - | 9.15 | - | - | 38.16 | - |
| VQ-GAN | - | - | - | - | - | - | - | 7.04 | - | - | 7.77 | - | - | 31.27 |
| Ours-S | 22.57 | 16.17 | 13.30 | 11.69 | 10.22 | 9.30 | 8.55 | 8.25 | 22.28 | 14.22 | 9.72 | 61.77 | 47.91 | 38.45 |
| Ours-SemiLarge | 19.70 | 13.92 | 11.39 | 10.41 | 9.23 | 8.75 | 8.22 | 8.03 | - | - | - | - | - | - |