notesum.ai
Published at December 9S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
cs.LG
cs.AI
Released Date: December 9, 2024
Authors: Xinyu Yang1, Jixuan Leng1, Geyang Guo2, Jiawei Zhao3, Ryumei Nakada4, Linjun Zhang4, Huaxiu Yao5, Beidi Chen1
Aff.: 1CMU; 2Georgia Tech; 3Caltech; 4Rutgers; 5UNC-Chapel Hill

| Model | Method | # Param(%) | BoolQ | PIQA | SIQA | HellaSwag | Wino | ARC-e | ARC-c | OBQA | Avg. |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ChatGPT1 | - | - | 73.1 | 85.4 | 68.5 | 78.5 | 66.1 | 89.8 | 79.9 | 74.8 | 77.0 |
| LLaMA-7B | Full FT3 | 100 | 70.3 | 84.2 | 80.1 | 92.3 | 85.4 | 86.6 | 72.8 | 83.4 | 81.9 |
| \cdashline2-12 | Prefix Prefix-Tuning 1 | 0.11 | 64.3 | 76.8 | 73.9 | 42.1 | 72.1 | 72.9 | 54.0 | 60.6 | 64.6 |
| Series series-adapter 1 | 0.99 | 63.0 | 79.2 | 76.3 | 67.9 | 75.7 | 74.5 | 57.1 | 72.4 | 70.8 | |
| Parallel parallel-adapter 1 | 3.54 | 67.9 | 76.4 | 78.8 | 69.8 | 78.9 | 73.7 | 57.3 | 75.2 | 72.2 | |
| LoRA lora 3 | 0.83 | 69.2 | 81.7 | 78.4 | 83.4 | 80.8 | 79.0 | 62.4 | 78.4 | 76.7 | |
| DoRA dora 1 | 0.84 | 68.5 | 82.9 | 79.6 | 84.8 | 80.8 | 81.4 | 65.8 | 81.0 | 78.1 | |
| Galore zhao2024galore 3 | 0.83† | 68.6 | 79.0 | 78.5 | 84.7 | 80.1 | 80.3 | 62.1 | 77.3 | 76.3 | |
| LoReFT wu2024reft 2 | 0.03 | 69.3 | 84.4 | 80.3 | 93.1 | 84.2 | 83.2 | 68.2 | 78.9 | 80.2 | |
| LISA pan2024lisa 3 | 9.91 | 70.4 | 82.1 | 78.7 | 92.4 | 82.9 | 84.9 | 70.2 | 78.4 | 80.0 | |
| S2FT (Ours) | 0.81 | 72.7 | 83.7 | 79.6 | 93.4 | 83.5 | 86.1 | 72.2 | 83.4 | 81.8 | |
| LLaMA-13B | Full FT3 | 100 | 74.5 | 86.3 | 81.3 | 94.4 | 86.9 | 89.7 | 77.9 | 88.8 | 85.0 |
| \cdashline2-12 | Prefix Prefix-Tuning 1 | 0.03 | 65.3 | 75.4 | 72.1 | 55.2 | 68.6 | 79.5 | 62.9 | 68.0 | 68.4 |
| Series series-adapter 1 | 0.80 | 71.8 | 83.0 | 79.2 | 88.1 | 82.4 | 82.5 | 67.3 | 81.8 | 79.5 | |
| Parallel parallel-adapter 1 | 2.89 | 72.5 | 84.9 | 79.8 | 92.1 | 84.7 | 84.2 | 71.2 | 82.4 | 81.4 | |
| LoRA lora 1 | 0.67 | 72.1 | 83.5 | 80.5 | 90.5 | 83.7 | 82.8 | 68.3 | 82.4 | 80.5 | |
| DoRA dora 1 | 0.68 | 72.4 | 84.9 | 81.5 | 92.4 | 84.2 | 84.2 | 69.6 | 82.8 | 81.5 | |
| LoReFT wu2024reft 2 | 0.03 | 72.1 | 86.3 | 81.8 | 95.1 | 87.2 | 86.2 | 73.7 | 84.2 | 83.3 | |
| S2FT (Ours) | 0.65 | 74.2 | 85.7 | 80.7 | 94.9 | 86.4 | 88.4 | 76.3 | 87.8 | 84.3 | |
| LLaMA2-7B | Full FT3 | 100 | 74.7 | 84.9 | 78.7 | 93.7 | 84.1 | 87.5 | 75.2 | 85.0 | 83.0 |
| \cdashline2-12 | LoRA lora 1 | 0.83 | 69.8 | 79.9 | 79.5 | 83.6 | 82.6 | 79.8 | 64.7 | 81.0 | 77.6 |
| DoRA dora 1 | 0.84 | 71.8 | 83.7 | 76.0 | 89.1 | 82.6 | 83.7 | 68.2 | 82.4 | 79.7 | |
| S2FT (Ours) | 0.81 | 72.9 | 86.1 | 80.2 | 94.3 | 85.5 | 87.2 | 74.6 | 83.4 | 83.0 | |
| LLaMA3-8B | Full FT3 | 100 | 73.9 | 86.2 | 79.1 | 93.1 | 85.8 | 88.1 | 78.2 | 84.0 | 83.6 |
| \cdashline2-12 | LoRA lora 1 | 0.70 | 70.8 | 85.2 | 79.7 | 92.5 | 84.9 | 88.9 | 78.7 | 84.4 | 82.5 |
| DoRA dora 1 | 0.71 | 74.6 | 89.3 | 79.9 | 95.5 | 85.6 | 90.5 | 80.4 | 85.8 | 85.2 | |
| S2FT (Ours) | 0.70 | 75.0 | 89.0 | 80.7 | 96.5 | 88.0 | 92.5 | 83.4 | 87.8 | 86.6 |