notesum.ai
Published at December 6Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern
cs.CL
cs.LG
Released Date: December 6, 2024
Authors: Hongyin Tang1, Di Xiu2, Lanrui Wang3, Xiurui Geng4, Jingang Wang1, Xunliang Cai1
Aff.: 1Meituan Inc., Beijing, China; 2Chinese Academy of Sciences, Beijing, China; 3Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; 4Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
