notesum.ai
Published at November 27Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding
cs.CL
cs.AI
Released Date: November 27, 2024
Authors: Ziyin Zhang1, Jiahao Xu2, Tian Liang2, Xingyu Chen1, Zhiwei He1, Rui Wang3, Zhaopeng Tu2
Aff.: 1Shanghai Jiao Tong University, Tencent AI Lab; 2Tencent AI Lab; 3Shanghai Jiao Tong University

| Methods | MT-Bench | Trans. | Sum. | QA | Math | RAG | Avg. | |
|---|---|---|---|---|---|---|---|---|
| Pythia | Const. | 1.45 | 1.47 | 1.24 | 1.43 | 1.52 | 1.42 | 1.42 |
| Heuristics | 1.51 | 1.58 | 1.34 | 1.58 | 1.64 | 1.51 | 1.53 | |
| SVIP | 1.63 | 1.62 | 1.45 | 1.67 | 1.72 | 1.66 | 1.63(+14.8%) | |
| Qwen2.5 | Const. | 1.08 | 0.87 | 1.11 | 0.92 | 1.43 | 0.99 | 1.07 |
| Heuristics | 1.10 | 0.91 | 1.10 | 0.92 | 1.34 | 1.03 | 1.07 | |
| SVIP | 1.33 | 1.12 | 1.37 | 1.14 | 1.57 | 1.23 | 1.29(+20.6%) | |
| LLaMA-3 | Const. | 2.04 | 2.48 | 2.56 | 2.34 | 2.32 | 2.28 | 2.34 |
| Heuristics | 2.30 | 3.13 | 3.33 | 2.61 | 2.52 | 2.63 | 2.76 | |
| SVIP | 2.31 | 3.04 | 3.48 | 2.63 | 2.89 | 2.59 | 2.83(+20.9%) |