notesum.ai

Published at November 5

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

cs.CL
cs.AI
cs.LG

Released Date: November 5, 2024

Authors: Wei Wu1, Zhuoshi Pan2, Chao Wang1, Liyi Chen1, Yunchu Bai3, Kun Fu4, Zheng Wang4, Hui Xiong

Aff.: 1School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China; 2School of Information Science and Technology, Tsinghua University, Beijing, China; 3School of Management, University of Science and Technology of China, Hefei, China; 4Alibaba Cloud Computing, Beijing, China

Arxiv: http://arxiv.org/abs/2411.02886v1