notesum.ai

Published at November 27

Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache

cs.CL
cs.LG

Released Date: November 27, 2024

Authors: Akshat Sharma1, Hangliang Ding2, Jianping Li1, Neel Dani1, Minjia Zhang1

Aff.: 1University of Illinois Urbana-Champaign; 2Tsinghua University

Arxiv: http://arxiv.org/abs/2411.18077v1