notesum.ai

Published at November 2

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

cs.DC
cs.AI
cs.LG

Released Date: November 2, 2024

Authors: Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu

Arxiv: http://arxiv.org/abs/2411.01142v1