notesum.ai
Published at October 23ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference
cs.AI
cs.MA
Released Date: October 23, 2024
Authors: Xin He1, Shunkang Zhang2, Yuxin Wang3, Haiyan Yin1, Zihao Zeng4, Shaohuai Shi5, Zhenheng Tang2, Xiaowen Chu6, Ivor Tsang1, Ong Yew Soon1
Aff.: 1CFAR, Agency for Science, Technology and Research (A*STAR); 2Hong Kong University of Science and Technology; 3Hong Kong Baptist University; 4Nanyang Technological University; 5Harbin Institute of Technology, Shenzhen; 6Hong Kong University of Science and Technology (Guangzhou)

| Task | Switch- | CS | BS | AIG | Pregated-MoE | SE-MoE | ExpertFlow with Predictor | ExpertFlow w/o Predictor | ||||
| Mem | Mem | Sav. (%) | Mem | Sav. (%) | Mem | Sav. (%) | Mem | Sav. (%) | ||||
| XSUM | 32 | 4 | 8 | 3.99 | 1.45 | 63.76 | 1.76 | 55.82 | 1.02 | 74.30 | 0.93 | 76.58 |
| 8 | 16 | 4.08 | 2.42 | 40.60 | 2.60 | 36.31 | 1.58 | 61.33 | 1.43 | 64.94 | ||
| 16 | 32 | 4.17 | 3.59 | 13.80 | 3.79 | 9.13 | 2.70 | 35.13 | 2.41 | 42.07 | ||
| 64 | 4 | 8 | 7.74 | 2.18 | 71.85 | 2.53 | 67.37 | 1.03 | 86.74 | 0.94 | 87.82 | |
| 8 | 16 | 7.85 | 2.94 | 62.49 | 3.29 | 58.12 | 1.60 | 79.63 | 1.44 | 81.62 | ||
| 16 | 32 | 7.96 | 4.18 | 47.42 | 4.55 | 42.85 | 2.70 | 66.06 | 2.44 | 69.32 | ||
| 128 | 4 | 8 | 15.26 | 3.64 | 76.12 | 4.05 | 73.45 | 1.01 | 93.35 | 0.96 | 93.72 | |
| 8 | 16 | 15.36 | 3.74 | 75.63 | 4.15 | 73.01 | 1.58 | 89.71 | 1.47 | 90.43 | ||
| 16 | 32 | 15.50 | 5.24 | 66.17 | 5.67 | 63.42 | 2.71 | 82.54 | 2.50 | 83.88 | ||
| Average Memory Saving (%) | 57.54 | 53.28 | 74.31 | 76.71 | ||||||||
| WMT16 | 32 | 4 | 8 | 3.95 | 1.38 | 65.02 | 1.72 | 56.39 | 0.94 | 76.17 | 0.92 | 76.72 |
| 8 | 16 | 3.97 | 1.39 | 65.02 | 1.76 | 55.62 | 1.43 | 64.04 | 1.39 | 65.08 | ||
| 16 | 32 | 4.03 | 2.47 | 38.54 | 2.68 | 33.47 | 2.37 | 41.05 | 2.30 | 42.84 | ||
| 64 | 4 | 8 | 7.70 | 2.12 | 72.54 | 2.39 | 68.93 | 0.95 | 87.68 | 0.92 | 88.06 | |
| 8 | 16 | 7.72 | 2.12 | 72.52 | 2.48 | 67.90 | 1.43 | 81.46 | 1.39 | 81.99 | ||
| 16 | 32 | 7.78 | 2.28 | 70.66 | 2.66 | 65.85 | 2.38 | 69.45 | 2.30 | 70.37 | ||
| 128 | 4 | 8 | 15.21 | 3.58 | 76.46 | 4.01 | 73.62 | 0.96 | 93.72 | 0.93 | 93.91 | |
| 8 | 16 | 15.23 | 3.59 | 76.44 | 4.00 | 73.70 | 1.43 | 90.58 | 1.39 | 90.85 | ||
| 16 | 32 | 15.28 | 3.75 | 75.48 | 4.18 | 72.63 | 2.40 | 84.30 | 2.33 | 84.78 | ||
| Average Memory Saving (%) | 68.08 | 63.12 | 76.49 | 77.18 | ||||||||