notesum.ai
Published at November 27Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
cs.CL
cs.LG
Released Date: November 27, 2024
Authors: Akshat Sharma1, Hangliang Ding2, Jianping Li1, Neel Dani1, Minjia Zhang1
Aff.: 1University of Illinois Urbana-Champaign; 2Tsinghua University

| Models | Methods | Single-Document QA | Synthetic | Code | Multi-Document QA | Summarization | Few-Shot Learning | ||||||||
|
Qasper |
MultifieldQA |
Passage Ret. |
Passage Count |
LCC |
RepoBench-P |
2WikiMQA |
HotpotQA |
Gov Report |
Multi News |
TREC |
SamSum |
TriviaQA |
Average |
||
| LLaMA2-7B-chat | Full Model | 22.78 | 33.59 | 8.44 | 4.75 | 59.56 | 48.07 | 22.35 | 24.88 | 24.99 | 23.60 | 59.67 | 39.38 | 85.38 | 35.19 |
| KIVI | 22.45 | 33.32 | 11.33 | 4.25 | 59.05 | 47.96 | 21.88 | 23.88 | 24.46 | 22.86 | 59.67 | 38.74 | 84.80 | 34.97 | |
| H2O (15%) | 16.98 | 29.72 | 11.00 | 4.55 | 56.87 | 48.25 | 19.92 | 24.58 | 22.19 | 22.16 | 57.33 | 37.80 | 84.02 | 33.49 | |
| SnapKV (15%) | 17.41 | 34.53 | 8.67 | 3.59 | 58.48 | 47.52 | 21.00 | 24.91 | 19.04 | 19.74 | 59.33 | 37.92 | 84.72 | 33.60 | |
| MiniKV | 21.01 | 29.23 | 10.00 | 3.82 | 58.38 | 47.99 | 20.91 | 22.97 | 23.45 | 22.54 | 59.00 | 37.94 | 80.95 | 33.71 | |
| MiniKV Pyramid | 19.92 | 33.96 | 10.00 | 4.12 | 59.72 | 49.29 | 20.69 | 24.62 | 24.16 | 22.90 | 59.00 | 39.15 | 82.89 | 34.65 | |
| LLaMA2-13B-chat | Full Model | 13.72 | 28.11 | 20.67 | 5.58 | 49.97 | 47.18 | 12.13 | 15.14 | 26.29 | 23.52 | 64.00 | 40.39 | 86.52 | 33.32 |
| KIVI | 13.56 | 28.16 | 17.33 | 5.05 | 49.21 | 47.18 | 12.80 | 15.27 | 25.24 | 23.07 | 64.33 | 40.24 | 87.07 | 32.96 | |
| H2O (15%) | 11.94 | 25.13 | 15.67 | 4.61 | 48.18 | 44.29 | 13.04 | 14.52 | 23.15 | 22.12 | 59.67 | 39.66 | 83.70 | 31.2 | |
| SnapKV (15%) | 12.11 | 27.09 | 22.00 | 5.18 | 49.52 | 45.44 | 14.10 | 14.40 | 20.06 | 20.75 | 62.33 | 39.25 | 85.86 | 32.16 | |
| MiniKV | 11.24 | 25.13 | 15.00 | 3.62 | 48.43 | 46.10 | 12.74 | 16.16 | 24.26 | 22.84 | 63.33 | 40.79 | 84.33 | 31.84 | |
| MiniKV Pyramid | 12.79 | 27.32 | 17.00 | 2.79 | 48.94 | 46.25 | 12.66 | 15.47 | 25.06 | 23.14 | 63.67 | 40.35 | 85.33 | 32.37 | |
| Mistral7B-instruct | Full Model | 25.79 | 47.97 | 50.83 | 2.98 | 50.69 | 47.22 | 27.44 | 36.44 | 31.84 | 25.82 | 62.67 | 40.49 | 86.29 | 41.2 |
| KIVI | 25.13 | 46.30 | 50.75 | 3.02 | 51.16 | 46.81 | 26.39 | 35.11 | 31.23 | 25.36 | 62.33 | 40.12 | 86.31 | 40.77 | |
| H2O (15%) | 20.20 | 42.55 | 42.84 | 3.00 | 49.66 | 45.95 | 24.27 | 33.04 | 27.43 | 24.33 | 60.33 | 40.45 | 86.20 | 38.4 | |
| SnapKV (15%) | 24.14 | 48.32 | 50.23 | 3.04 | 50.39 | 45.76 | 25.76 | 34.55 | 25.10 | 22.77 | 61.67 | 40.12 | 86.90 | 39.90 | |
| MiniKV | 22.94 | 45.80 | 49.47 | 3.36 | 49.78 | 45.56 | 24.27 | 33.84 | 29.73 | 25.22 | 61.67 | 39.96 | 86.36 | 39.84 | |
| MiniKV Pyramid | 23.10 | 45.91 | 48.88 | 3.24 | 50.34 | 45.41 | 25.18 | 34.04 | 29.69 | 25.32 | 61.67 | 40.17 | 86.63 | 39.97 | |