notesum.ai
Published at November 7SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference
cs.CL
cs.AI
cs.DC
cs.LG
Released Date: November 7, 2024
Authors: Gabriele Oliaro1, Zhihao Jia2, Daniel Campos1, Aurick Qiao1
Aff.: 1Snowflake AI Research; 2Carnegie Mellon University
