notesum.ai

Published at December 6

Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern

cs.CL
cs.LG

Released Date: December 6, 2024

Authors: Hongyin Tang1, Di Xiu2, Lanrui Wang3, Xiurui Geng4, Jingang Wang1, Xunliang Cai1

Aff.: 1Meituan Inc., Beijing, China; 2Chinese Academy of Sciences, Beijing, China; 3Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; 4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China

Arxiv: http://arxiv.org/pdf/2412.04757v1