notesum.ai
Published at November 27FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving
cs.LG
cs.DC
Released Date: November 27, 2024
Authors: Ao Shen1, Zhiyao Li2, Mingyu Gao3
Aff.: 1Purdue University, West Lafayette, USA; 2Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China; 3Shanghai Qi Zhi Institute, Shanghai, China

| Dynamic Block Group |
| + KV Cache Reuse |