notesum.ai
Published at December 10EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
cs.DC
cs.AI
Released Date: December 10, 2024
Authors: Jialiang Cheng1, Ning Gao1, Yun Yue1, Zhiling Ye1, Jiadi Jiang1, Jian Sha1
Aff.: 1Ant Group

| FineWeb-Edu dataset | in-house dataset | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Benchmark | Baseline | PLS | DiLoCo | CO2 | EDiT | A-EDiT | Baseline | DiLoCo | EDiT | A-EDiT |
| MMLU () | 32.28 | 30.86 | 32.55 | 31.33 | 32.29 | 31.96 | 24.12 | 24.63 | 24.47 | 24.56 |
| ARC-E () | 59.90 | 57.60 | 58.60 | 57.00 | 60.00 | 57.70 | 36.80 | 36.70 | 38.70 | 37.40 |
| ARC-C () | 30.20 | 28.60 | 31.00 | 30.50 | 32.40 | 30.20 | 22.50 | 22.80 | 23.00 | 22.40 |
| HellaSwag () | 50.99 | 48.03 | 51.64 | 48.66 | 51.75 | 51.60 | 40.60 | 40.80 | 40.90 | 40.20 |
| PIQA () | 69.90 | 67.80 | 69.50 | 67.00 | 68.10 | 69.90 | 67.10 | 66.80 | 67.00 | 66.40 |
| CommonSense-QA () | 37.40 | 33.80 | 35.40 | 34.40 | 36.30 | 35.30 | 18.50 | 18.20 | 18.50 | 17.90 |
| OpenBookQA () | 25.40 | 22.80 | 25.20 | 24.40 | 26.00 | 24.00 | 18.00 | 17.80 | 18.00 | 18.20 |
| WinoGrande () | 50.70 | 49.20 | 47.80 | 50.70 | 51.70 | 50.50 | 49.10 | 49.20 | 49.10 | 48.80 |
| Average () | 44.60 | 42.34 | 43.96 | 43.00 | 44.82 | 43.90 | 34.59 | 34.62 | 34.96 | 34.49 |