notesum.ai

Published at November 7

SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

cs.CL
cs.AI
cs.DC
cs.LG

Released Date: November 7, 2024

Authors: Gabriele Oliaro1, Zhihao Jia2, Daniel Campos1, Aurick Qiao1

Aff.: 1Snowflake AI Research; 2Carnegie Mellon University

Arxiv: http://arxiv.org/abs/2411.04975v1