notesum.ai
Published at December 6Direct Quantized Training of Language Models with Stochastic Rounding
cs.LG
cs.CL
Released Date: December 6, 2024
Authors: Kaiyan Zhao1, Tsuguchika Tabaru2, Kenichi Kobayashi2, Takumi Honda2, Masafumi Yamazaki2, Yoshimasa Tsuruoka1
Aff.: 1The University of Tokyo; 2Fujitsu Limited

| Methods | Loss | PPL |
|---|---|---|
| FP32 (reproduced) | 5.39 | 41.93 |
| BitNet b1.58 (reproduced) | 5.52 | 45.83 |
| DQT 1.58 bits (ours) | 6.20 | 73.41 |
| DQT 8 bits (ours) | 5.80 | 55.75 |
| DQT 8 bits (ternary Inf.)† | 5.93 | 60.98 |