notesum.ai
Published at December 4PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
cs.CV
Released Date: December 4, 2024
Authors: Ao Wang1, Hui Chen2, Jianchao Tan3, Kefeng Zhang4, Xunliang Cai4, Zijia Lin, Jungong Han, Guiguang Ding
Aff.: 1School of Software, Tsinghua University; 2BNRist, Tsinghua University; 3Department of Automation, Tsinghua University; 4Meituan Inc.

| Model | Method | 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
| 7B | Local | 66.0 / 0.22 | 105 / 0.14 | 70.0 / 0.18 | 47.5 / 0.17 | 33.8 / 0.19 | 14.7 / 0.30 | 5.50 / 0.41 | 4.78 / 0.50 | 4.03 / 0.55 |
| H2O | 54.5 / 0.28 | 48.3 / 0.31 | 32.0 / 0.33 | 18.3 / 0.32 | 12.9 / 0.34 | 7.50 / 0.41 | 4.28 / 0.51 | 4.16 / 0.53 | 3.72 / 0.57 | |
| Elastic | 18.0 / 0.29 | 14.0 / 0.29 | 11.8 / 0.29 | 7.38 / 0.32 | 6.31 / 0.36 | 5.97 / 0.39 | 3.66 / 0.54 | 3.55 / 0.55 | 3.58 / 0.57 | |
| Ours | 4.41 / 0.43 | 3.69 / 0.51 | 3.48 / 0.55 | 3.41 / 0.57 | 3.41 / 0.58 | 3.41 / 0.59 | 3.25 / 0.63 | 3.20 / 0.74 | 3.20 / 0.76 | |
| 13B | Local | 60.0 / 0.15 | 139 / 0.12 | 56.3 / 0.21 | 16.1 / 0.27 | 13.2 / 0.31 | 7.06 / 0.37 | 3.72 / 0.48 | 3.72 / 0.52 | 3.25 / 0.55 |
| H2O | 12.4 / 0.39 | 10.4 / 0.39 | 8.50 / 0.40 | 4.56 / 0.46 | 3.78 / 0.49 | 3.58 / 0.49 | 3.16 / 0.55 | 3.28 / 0.57 | 3.06 / 0.59 | |
| Elastic | 14.9 / 0.30 | 5.75 / 0.35 | 4.41 / 0.40 | 3.55 / 0.50 | 3.36 / 0.52 | 3.28 / 0.53 | 2.97 / 0.58 | 2.89 / 0.60 | 3.02 / 0.59 | |
| Ours | 3.72 / 0.48 | 3.17 / 0.53 | 2.97 / 0.59 | 2.92 / 0.60 | 2.89 / 0.60 | 2.84 / 0.61 | 2.77 / 0.69 | 2.73 / 0.74 | 2.73 / 0.79 |