notesum.ai
Published at October 18MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
quant-ph
cs.AR
math.OC
Released Date: October 18, 2024
Authors: Rachel S. Y. Teo, Tan M. Nguyen1
Aff.: 1Department of Mathematics, National University of Singapore

| Model/Metric | Parameters | Clean WikiText-103 | Attacked WikiText-103 | ||
|---|---|---|---|---|---|
| Valid PPL | Test PPL | Valid PPL | Test PPL | ||
| SMoE-medium (baseline) | 216M | 33.76 | 35.55 | 42.24 | 44.19 |
| MomentumSMoE-medium | 216M | 32.29 | 33.46 | 40.94 | 42.33 |
| AdamSMoE-medium | 216M | 31.59 | 33.25 | 39.27 | 41.11 |
| SMoE-large (baseline) | 388M | 29.31 | 30.33 | 36.77 | 37.83 |
| MomentumSMoE-large | 388M | 27.58 | 29.03 | 35.21 | 36.78 |
| GLaM-medium (baseline) | 220M | 36.37 | 37.71 | 45.83 | 47.61 |
| MomentumGLaM-medium | 220M | 33.87 | 35.29 | 42.15 | 43.64 |
| AdamGLaM-medium | 220M | 32.99 | 34.32 | 41.09 | 42.81 |