notesum.ai
Published at November 11LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models
cs.CL
cs.AI
cs.LG
Released Date: November 11, 2024
Authors: Runming Yang1, Taiqiang Wu2, Jiahao Wang3, Pengfei Hu4, Ngai Wong2, Yujiu Yang1
Aff.: 1Shenzhen International Graduate School Tsinghua University, Shenzhen, China; 2Department of EEE, The University of Hong Kong, Hong Kong, China; 3Department of Computer Science, The University of Hong Kong, Hong Kong, China; 4PCG, Tencent, Beijing, China

| Metric | Llama 3.1 8B Llama 3 pruned 1B | Llama 2 7B TinyLlama 1.1B | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Teacher | Student | SFT | LoRA | KD | LLM-Neo | Teacher | Student | SFT | LoRA | KD | LLM-Neo | |
| Mem | - | - | 63G | 68G | 231G | 177G | - | - | 66G | 42G | 167G | 136G |
| Time | - | - | 10min | 7min | 25min | 20min | - | - | 13min | 12min | 26min | 25min |
| ARC-e | 81.90 | 28.07 | 30.39 | 32.95 | 34.85 | 34.89 | 76.73 | 60.27 | 61.62 | 61.53 | 61.20 | 60.52 |
| CEVAL | 53.94 | 25.33 | 25.63 | 24.15 | 23.63 | 24.00 | 34.47 | 24.96 | 21.17 | 23.85 | 23.70 | 25.11 |
| HellaS. | 59.10 | 26.00 | 26.67 | 26.67 | 27.08 | 27.14 | 56.47 | 44.99 | 46.07 | 45.66 | 45.89 | 45.53 |
| PIQA | 80.09 | 53.92 | 54.41 | 56.09 | 57.45 | 56.58 | 78.35 | 74.34 | 72.69 | 72.58 | 73.18 | 72.31 |
| WinoG. | 73.72 | 50.43 | 51.38 | 51.85 | 52.64 | 52.64 | 71.03 | 58.72 | 59.27 | 59.91 | 59.67 | 60.54 |
| Avg. | 69.35 | 36.35 | 37.58 | 38.34 | 39.13 | 39.21 | 63.41 | 52.66 | 52.16 | 52.71 | 52.73 | 52.80 |