notesum.ai

Published at November 8

Recycled Attention: Efficient inference for long-context language models

cs.CL

Released Date: November 8, 2024

Authors: Fangyuan Xu1, Tanya Goyal2, Eunsol Choi3

Aff.: 1The University of Texas at Austin; 2Cornell University; 3New York University

Arxiv: http://arxiv.org/abs/2411.05787v1