notesum.ai
Published at November 12The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving
cs.PF
cs.AI
Released Date: November 12, 2024
Authors: Kyoungmin Kim1, Kijae Hong2, Caglar Gulcehre1, Anastasia Ailamaki1
Aff.: 1Ecole Polytechnique Fédérale de Lausanne, Switzerland; 2CERES TECHNOLOGIES, South Korea
