notesum.ai
Published at November 4A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness
cs.CL
cs.AI
cs.LG
68T50 (Primary) 68T07 (Secondary)
I.2.7
Released Date: November 4, 2024
Authors: Fali Wang1, Zhiwei Zhang1, Xianren Zhang1, Zongyu Wu1, Tzuhao Mo2, Qiuhao Lu3, Wanjing Wang3, Rui Li3, Junjie Xu1, Xianfeng Tang4, Qi He4, Yao Ma5, Ming Huang3, Suhang Wang1
Aff.: 1The Pennsylvania State University, University Park, USA; 2University of Pennsylvania, Philadelphia, USA; 3UTHealth Houston, Houston, USA; 4Amazon, Palo Alto, USA; 5Rensselaer Polytechnic Institute, Troy, USA

| Criteria | Pruning | Knowledge Distillation | Quantization | Low-Rank Techniques |
| Definition | Removes unneeded parameters | Transfers knowledge from a larger to a smaller model | Lowers the precision of parameters | Uses low-rank decomposition on weights |
| Goal | Reduces size and computation | Shrinks model while retaining performance | Decreases size and speeds up processing | Reduces parameters and computation |
| Method | Cuts weights or layers based on importance | Smaller model mimics larger model’s output | Converts parameters to lower precision | Decomposes matrices into smaller components |
| Advantages | Reduces size and computation significantly | Preserves performance in smaller models | Speeds up inference, less storage | Efficient, mostly preserves performance |
| Disadvantages | May reduce accuracy, irregular memory access | Resource-intensive, requires large teacher model | Potential accuracy loss, may need specific hardware | Effectiveness varies, requires rank selection |
| Model Size Impact | High reduction | Significant reduction through knowledge transfer | High, tied to precision reduction | Moderate, reduces redundant parameters |
| Performance Impact | Possible degradation if over-pruned | Maintains if well distilled | Minor to moderate loss, depends on method | Mostly retains performance, may need tuning |
| Complexity | Moderate, needs parameter evaluation | High, involves dual-model training | Moderate, varies with precision level | Moderate, requires matrix factorization |
| Use Cases | Resource-limited settings | Efficient model creation for limited resources | Fast inference needs, edge devices | When weight matrices are redundant |