notesum.ai

Published at May 10

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

NeurIPS
Spotlight

Released Date: May 10, 2024

Authors: Wenyu Du1, Tongxu Luo2, Zihan Qiu3, Zeyu Huang4, Yikang Shen5, Reynold Cheng1, Yike Guo2, Jie Fu2

Aff.: 1School of Computing and Data Science, The University of Hong Kong; 2HKUST; 3Tsinghua University; 4University of Edinburgh; 5MIT-IBM Watson AI Lab

Arxiv: https://openreview.net/pdf/2c7146fd80b9afb7935702797fb5bfb715dfb090.pdf