notesum.ai
Published at April 29Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
NeurIPS
Released Date: April 29, 2024
Authors: Cheng Luo1, Jiawei Zhao2, Zhuoming Chen3, Beidi Chen3, Anima Anandkumar1
Aff.: 1California Institute of Technology; 2Meta FAIR; 3Carnegie Mellon University
Arxiv: https://openreview.net/pdf/d291323c2636eacc38c4c3399f3ac1d69c920a5e.pdf