notesum.ai
Published at November 2Data movement limits to frontier model training
cs.DC
cs.AI
cs.LG
Released Date: November 2, 2024
Authors: Ege Erdil1, David Schneider-Joseph2
Aff.: 1EpochAI; 2David Schneider-Joseph

| Network bandwidth per gradient step | Slices along… | Communications can coincide with… | |
|---|---|---|---|
| Data parallelism | Nothing | ||
| Tensor parallelism | (for large ) | Nothing | |
| Pipeline parallelism | Expert parallelism | ||
| Expert parallelism | Pipeline parallelism |