notesum.ai
Published at November 11WDMoE: Wireless Distributed Mixture of Experts for Large Language Models
cs.LG
cs.AI
cs.DC
cs.IT
math.IT
Released Date: November 11, 2024
Authors: Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Wenjun Zhang, Ping Zhang

| Model | Active Params | MMLU | PIQA | ARC-E | ARC-C | Humaneval | GSM-8K | BoolQ | MBPP |
|---|---|---|---|---|---|---|---|---|---|
| Llama 2 7B | 7B | 46.8% | 78.3% | 56.1% | 40.3% | 12.8% | 16.7% | 74.9% | 14.8% |
| Llama 2 13B | 13B | 55% | 79.8% | 71.8% | 60.3% | 18.9% | 29.6% | 82.4% | 26.8% |
| Llama 2 70B | 70B | 69.7% | 82.5% | 85.9% | 78.3% | 26.2% | 63.5% | 87.7% | 39.6% |
| Mistral 7B-v0.1 | 7B | 64.1% | 81.6% | 83.6% | 74.2% | 22.6% | 47.5% | 84.1% | 32.0% |
| Mixtral 8x7B-Instruct-v0.1 | 13B | 70.0% | 83.2% | 92.8% | 84.8% | 47.6% | 70.9% | 88.72% | 35.2% |
| WDMoE | 13B | 68.98% | 83.51% | 93.12% | 86.78% | 48.17% | 71.29% | 88.87% | 37.4% |