notesum.ai
Published at October 30BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference
cs.CL
cs.AI
Released Date: October 30, 2024
Authors: Junqi Zhao1, Zhijin Fang2, Shu Li3, Shaohui Yang4, Shichao He5
Aff.: 1Department of Computer Science, New York University, New York, USA; 2Department of Statistics and Data Science, University of California, Los Angeles, USA; 3Department of Mathematical Science, Beihang University, Beijing, China; 4School of Computer Science, University of Leeds, Leeds, United Kingdom; 5Department of Computer Science and Applied Math, Brandeis University, Boston, USA

| Mean Score(%) | 60% | 50% | 40% | 30% | 20% | 10% |
|---|---|---|---|---|---|---|
| BUZZ | 11.9 | 12.1 | 12.3 | 11.9 | 11.3 | 10.9 |
| BUZZ (with logn) | 12.3 | 12.0 | 12.0 | 11.6 | 11.6 | 10.4 |
| H2O | 11.5 | 11.4 | 11.6 | 11.4 | 11.8 | 10.2 |
| StreamingLLM | 11.9 | 11.1 | 11.2 | 10.8 | 10.0 | 3.8 |
| Local | 7.6 | 4.4 | 2.9 | 1.8 | 1.4 | 0.1 |