notesum.ai
Published at November 26An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
cs.LG
Released Date: November 26, 2024
Authors: Yunzhe Hu1, Difan Zou2, Dong Xu1
Aff.: 1School of Computing and Data Science, The University of Hong Kong; 2School of Computing and Data Science & Institute of Data Science, The University of Hong Kong

| Models | CIFAR-10 | CIFAR-100 | ||
|---|---|---|---|---|
| cross-entropy | + SRR regularization (L=12) | cross-entropy | + SRR regularization (L=12) | |
| CRATE-C | 76.87 | 77.61 | 43.40 | 44.53 |
| CRATE-N | 81.52 | 81.91 | 55.11 | 55.62 |
| CRATE-T | 85.49 | 85.52 | 60.59 | 60.69 |
| CRATE | 86.67 | 86.79 | 62.40 | 62.52 |