notesum.ai

Published at November 12

The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving

cs.PF
cs.AI

Released Date: November 12, 2024

Authors: Kyoungmin Kim1, Kijae Hong2, Caglar Gulcehre1, Anastasia Ailamaki1

Aff.: 1Ecole Polytechnique Fédérale de Lausanne, Switzerland; 2CERES TECHNOLOGIES, South Korea

Arxiv: http://arxiv.org/abs/2411.07447v1