notesum.ai
Published at November 26Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation
cs.LG
cs.PF
Released Date: November 26, 2024
Authors: Chaoyi Jiang1, Lei Gao1, Hossein Entezari Zarch1, Murali Annavaram1
Aff.: 1University of Southern California