notesum.ai
Published at December 4ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
cs.LG
cs.AI
cs.PF
Released Date: December 4, 2024
Authors: Guangda Liu1, Chengwei Li, Jieru Zhao, Chenqi Zhang, Minyi Guo
Aff.: 1Shanghai Jiao Tong University

| 256 | 512 | 1024 | 2048 | |
|---|---|---|---|---|
| Quest | 35.63 | 40.83 | 43.23 | 45.59 |
| InfiniGen | 43.69 | 45.04 | 45.13 | 45.14 |
| ClusterKV (ours) | 46.69 | 48.02 | 48.34 | 48.7 |
| Full KV | 49.01 | |||