notesum.ai

Published at October 30

BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference

cs.CL
cs.AI

Released Date: October 30, 2024

Authors: Junqi Zhao1, Zhijin Fang2, Shu Li3, Shaohui Yang4, Shichao He5

Aff.: 1Department of Computer Science, New York University, New York, USA; 2Department of Statistics and Data Science, University of California, Los Angeles, USA; 3Department of Mathematical Science, Beihang University, Beijing, China; 4School of Computer Science, University of Leeds, Leeds, United Kingdom; 5Department of Computer Science and Applied Math, Brandeis University, Boston, USA

Arxiv: http://arxiv.org/abs/2410.23079v1