notesum.ai

Published at November 27

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

cs.CL

cs.AI

Released Date: November 27, 2024

Authors: Ziyin Zhang¹, Jiahao Xu², Tian Liang², Xingyu Chen¹, Zhiwei He¹, Rui Wang³, Zhaopeng Tu²

Aff.: ¹Shanghai Jiao Tong University, Tencent AI Lab; ²Tencent AI Lab; ³Shanghai Jiao Tong University

Arxiv: http://arxiv.org/abs/2411.18462v1

Refer to caption

	Methods	MT-Bench	Trans.	Sum.	QA	Math	RAG	Avg.
Pythia ${}_{\text{(6.9B, 160M)}}$	Const.	1.45	1.47	1.24	1.43	1.52	1.42	1.42
	Heuristics	1.51	1.58	1.34	1.58	1.64	1.51	1.53
	SVIP	1.63	1.62	1.45	1.67	1.72	1.66	1.63_(+14.8%)
Qwen2.5 ${}_{\text{(14B, 0.5B)}}$	Const.	1.08	0.87	1.11	0.92	1.43	0.99	1.07
	Heuristics	1.10	0.91	1.10	0.92	1.34	1.03	1.07
	SVIP	1.33	1.12	1.37	1.14	1.57	1.23	1.29_(+20.6%)
LLaMA-3 ${}_{\text{(70B, 8B)}}$	Const.	2.04	2.48	2.56	2.34	2.32	2.28	2.34
	Heuristics	2.30	3.13	3.33	2.61	2.52	2.63	2.76
	SVIP	2.31	3.04	3.48	2.63	2.89	2.59	2.83_(+20.9%)