notesum.ai
Published at November 12PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model
cs.LG
cs.AI
Released Date: November 12, 2024
Authors: Yilun Liu1, Yunpu Ma2, Shuo Chen2, Zifeng Ding3, Bailan He2, Zhen Han2, Volker Tresp2
Aff.: 1Technical University of Munich; 2Ludwig Maximilian University of Munich; 3Ludwig Maximilian University of Munich and University of Cambridge

| LLM | Arch. | Strategy | # Act. | % Act. | CR | AR |
|---|---|---|---|---|---|---|
| LoRA4 | @Attn | 0.52M | 0.041 | 57.15 | 28.42 | |
| LoRA16 | PERFT-R (Top1/2) | 0.59M | 0.046 | 66.66 | 31.91 | |
| LoRA8 | PERFT-R (Top2/2) | 0.59M | 0.046 | 66.98 | 31.18 | |
| OLMoE 1B-7B (Top8/64) | LoRA16 | @Attn | 2.10M | 0.164 | 62.86 | 29.71 |
| LoRA4 | PERFT-E (Top8/64) | 2.10M | 0.164 | 69.42 | 31.30 | |
| LoRA32 | PERFT-R (Top1/4) | 2.23M | 0.174 | 67.32 | 32.29 | |
| LoRA64 | @Attn | 8.39M | 0.654 | 67.95 | 28.82 | |
| LoRA16 | PERFT-E (Top8/64) | 8.39M | 0.654 | 69.29 | 29.08 | |
| LoRA16 | PERFT-R (Top8/8) | 8.65M | 0.675 | 68.81 | 31.65 | |
| Mixtral 13B-47B (Top2/8) | LoRA8 | @Attn | 3.41M | 0.026 | 85.02 | 64.72 |
| LoRA8 | PERFT-R (Top2/2) | 4.46M | 0.035 | 86.23 | 69.03 | |
| LoRA8 | PERFT-R (Top2/8) | 5.24M | 0.046 | 85.68 | 68.14 |