notesum.ai
Published at November 28On the effectiveness of discrete representations in sparse mixture of experts
cs.LG
Released Date: November 28, 2024
Authors: Giang Do1, Kha Pham1, Hung Le1, Truyen Tran1
Aff.: 1Applied Artificial Intelligence Institute (A2I2), Deakin University

| Architecture | FLOPs(x) | Transformer | Transformer-XL | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset | SST-2 | SST-5 | IMDB | BANKING77 | SST-2 | SST-5 | IMDB | BANKING77 | |
| VQMoE | 5.6145 | 82.6 | 41.1 | 89.5 | 84.8 | 83.3 | 42.0 | 89.1 | 85.3 |
| SMoE | 7.7620 | 82.1 | 39.5 | 89.3 | 82.6 | 80.8 | 40.4 | 88.6 | 80.2 |
| SMoE-Dropout | 7.7620 | 81.3 | 39.6 | 88.9 | 77.9 | 81.8 | 40.0 | 89.1 | 77.3 |
| XMoE | 7.7620 | 82.4 | 39.9 | 89.0 | 83.1 | 81.3 | 40.3 | 88.7 | 82.7 |
| StableMoE | 7.7620 | 82.2 | 40.4 | 89.1 | 82.7 | 82.5 | 41.1 | 88.5 | 78.6 |