notesum.ai
Published at October 20Lossless KV Cache Compression to 2%
cs.CV
cs.AI
cs.MA
Released Date: October 20, 2024
Authors: Zhen Yang1, J. N. Han1, Kan Wu1, Ruobing Xie1, An Wang2, Xingwu Sun1, Zhanhui Kang1
Aff.: 1Tencent Hunyuan; 2Tencent Hunyuan, Tokyo Institute of Technology

| Method | KV Cache Memory |
|---|---|
| MHA | |
| GQA | |
| MLA | |
| CLA | |
| CLLA | |
| CLLA-quant |