notesum.ai

Published at November 26

Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation

cs.LG
cs.PF

Released Date: November 26, 2024

Authors: Chaoyi Jiang1, Lei Gao1, Hossein Entezari Zarch1, Murali Annavaram1

Aff.: 1University of Southern California

Arxiv: http://arxiv.org/abs/2411.17089v1