notesum.ai
Published at November 25MH-MoE:Multi-Head Mixture-of-Experts
cs.CL
Released Date: November 25, 2024
Authors: Shaohan Huang1, Xun Wu1, Shuming Ma1, Furu Wei1
Aff.: 1Microsoft Research
| Model | Training Steps | RedPajama | Wiki | C4 |
| Dense | 50,000 | 13.01 | 12.95 | 17.41 |
| SMoE | 11.87 | 10.51 | 15.63 | |
| Fine-grained SMoE | 11.68 | 10.18 | 15.21 | |
| MH-MoE (head=2) | 11.60 | 10.11 | 15.11 | |
| MH-MoE (head=3) | 11.45 | 10.00 | 14.90 | |
| Dense | 100,000 | 12.13 | 11.58 | 16.21 |
| SMoE | 10.90 | 9.68 | 14.35 | |
| Fine-grained SMoE | 10.74 | 9.38 | 13.97 | |
| MH-MoE (head=2) | 10.70 | 9.26 | 13.80 | |
| MH-MoE (head=3) | 10.51 | 9.18 | 13.63 |