notesum.ai
Published at May 10Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
NeurIPS
Spotlight
Released Date: May 10, 2024
Authors: Wenyu Du1, Tongxu Luo2, Zihan Qiu3, Zeyu Huang4, Yikang Shen5, Reynold Cheng1, Yike Guo2, Jie Fu2
Aff.: 1School of Computing and Data Science, The University of Hong Kong; 2HKUST; 3Tsinghua University; 4University of Edinburgh; 5MIT-IBM Watson AI Lab
Arxiv: https://openreview.net/pdf/2c7146fd80b9afb7935702797fb5bfb715dfb090.pdf