notesum.ai

Published at November 27

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

cs.CL
cs.AI

Released Date: November 27, 2024

Authors: Ziyin Zhang1, Jiahao Xu2, Tian Liang2, Xingyu Chen1, Zhiwei He1, Rui Wang3, Zhaopeng Tu2

Aff.: 1Shanghai Jiao Tong University, Tencent AI Lab; 2Tencent AI Lab; 3Shanghai Jiao Tong University

Arxiv: http://arxiv.org/abs/2411.18462v1