notesum.ai

Published at October 30

BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference

cs.CL

cs.AI

Released Date: October 30, 2024

Authors: Junqi Zhao¹, Zhijin Fang², Shu Li³, Shaohui Yang⁴, Shichao He⁵

Aff.: ¹Department of Computer Science, New York University, New York, USA; ²Department of Statistics and Data Science, University of California, Los Angeles, USA; ³Department of Mathematical Science, Beihang University, Beijing, China; ⁴School of Computer Science, University of Leeds, Leeds, United Kingdom; ⁵Department of Computer Science and Applied Math, Brandeis University, Boston, USA

Arxiv: http://arxiv.org/abs/2410.23079v1

Mean Score(%)	60%	50%	40%	30%	20%	10%
BUZZ	11.9	12.1	12.3	11.9	11.3	10.9
BUZZ (with logn)	12.3	12.0	12.0	11.6	11.6	10.4
H₂O	11.5	11.4	11.6	11.4	11.8	10.2
StreamingLLM	11.9	11.1	11.2	10.8	10.0	3.8
Local	7.6	4.4	2.9	1.8	1.4	0.1