notesum.ai
Published at November 4Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
cs.CL
cs.AI
cs.LG
Released Date: November 4, 2024
Authors: Xingtai Lv1, Ning Ding1, Kaiyan Zhang1, Ermo Hua1, Ganqu Cui2, Bowen Zhou1
Aff.: 1Department of Electronic Engineering, Tsinghua University; 2Shanghai AI Laboratory

| Params | Time pre Step | GPU memory | |
| Transformer | 135M | 153.4ms | 2302MiB |
| LPA | 125M | 150.6ms | 2276MiB |
| Transformer | 369M | 351.0ms | 4648MiB |
| LPA | 319M | 322.9ms | 4464MiB |
| Transformer | 3.23B | 6.923s | 71.94GiB |
| LPA | 2.43B | 6.066s | 70.26GiB |