notesum.ai
Published at November 5TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
cs.CL
cs.AI
cs.LG
Released Date: November 5, 2024
Authors: Wei Wu1, Zhuoshi Pan2, Chao Wang1, Liyi Chen1, Yunchu Bai3, Kun Fu4, Zheng Wang4, Hui Xiong
Aff.: 1School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China; 2School of Information Science and Technology, Tsinghua University, Beijing, China; 3School of Management, University of Science and Technology of China, Hefei, China; 4Alibaba Cloud Computing, Beijing, China

| Methods | En.Sum | En.QA | En.MC | En.Dia | Code.D | Math.F | R.PK | R.Num | R.KV | Avg. |
|---|---|---|---|---|---|---|---|---|---|---|
| Qwen2-7B | 23.80 | 14.92 | 54.59 | 8.50 | 28.17 | 19.71 | 28.81 | 28.64 | 19.00 | 25.13 |
| NTK | 18.73 | 15.34 | 41.28 | 7.50 | 24.87 | 27.71 | 99.15 | 97.46 | 59.80 | 43.54 |
| SelfExtend | 3.76 | 4.44 | 20.09 | 5.00 | 8.12 | 2.29 | 0.00 | 0.00 | 0.00 | 4.86 |
| StreamingLLM | 19.60 | 13.61 | 48.03 | 3.50 | 27.92 | 19.43 | 5.08 | 5.08 | 2.40 | 16.07 |
| InfLLM | 19.65 | 15.71 | 46.29 | 7.50 | 27.41 | 24.00 | 70.34 | 72.20 | 5.40 | 32.06 |
| TokenSelect | 22.62 | 18.86 | 48.47 | 7.50 | 30.20 | 32.57 | 100.00 | 100.00 | 86.60 | 49.65 |
| Llama-3-8B | 24.70 | 15.50 | 44.10 | 7.50 | 27.92 | 21.70 | 8.50 | 7.80 | 6.20 | 18.21 |
| NTK | 6.40 | 0.40 | 0.00 | 0.00 | 0.50 | 2.60 | 0.00 | 0.00 | 0.00 | 1.10 |
| SelfExtend | 14.70 | 8.60 | 19.70 | 0.00 | 0.00 | 22.60 | 100.00 | 100.00 | 0.20 | 29.53 |
| StreamingLLM | 20.40 | 14.30 | 40.60 | 5.00 | 28.43 | 21.40 | 8.50 | 8.30 | 0.40 | 16.37 |
| InfLLM | 24.30 | 19.50 | 43.70 | 10.50 | 27.41 | 23.70 | 100.00 | 99.00 | 5.00 | 39.23 |
| TokenSelect | 26.99 | 19.39 | 45.85 | 14.50 | 27.41 | 28.29 | 100.00 | 97.29 | 40.00 | 44.41 |
| Yi-1.5-6B | 18.78 | 10.48 | 39.74 | 5.00 | 29.95 | 16.00 | 5.08 | 5.08 | 0.00 | 14.45 |
| NTK | 4.66 | 0.58 | 0.87 | 0.00 | 0.00 | 1.43 | 0.00 | 0.00 | 0.00 | 0.83 |
| SelfExtend | 5.62 | 1.07 | 1.31 | 0.00 | 0.00 | 1.14 | 0.00 | 0.00 | 0.00 | 1.01 |
| StreamingLLM | 15.35 | 9.26 | 35.81 | 5.00 | 27.41 | 14.29 | 5.08 | 4.92 | 0.00 | 13.01 |
| InfLLM | 16.98 | 8.93 | 34.06 | 3.00 | 27.41 | 16.86 | 100.00 | 96.61 | 0.00 | 33.76 |
| TokenSelect | 21.13 | 12.32 | 40.61 | 5.50 | 30.71 | 20.86 | 100.00 | 99.83 | 0.00 | 36.77 |