notesum.ai
Published at November 26Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
cs.LG
Released Date: November 26, 2024
Authors: Vladimir Malinovskii1, Andrei Panferov2, Ivan Ilin3, Han Guo4, Peter Richtárik3, Dan Alistarh
Aff.: 1Yandex, HSE University; 2ISTA; 3GenAI CoE, KAUST; 4MIT

| Method | wbits | Wiki2 | ArcC | ArcE | PiQA | Wino | HellaS | Avg | MMLU |
|---|---|---|---|---|---|---|---|---|---|
| FP16 | 16.00 | 5.607 | 51.28 | 81.52 | 80.03 | 73.72 | 60.01 | 69.31 | 65.35 |
| AF | 3.25 | 8.056 | 43.94 | 75.25 | 77.53 | 69.38 | 52.91 | 63.80 | 53.15 |
| NF | 3.25 | 7.683 | 42.66 | 75.63 | 77.97 | 70.48 | 54.92 | 64.33 | 55.82 |
| HQQ | 3.25 | 7.317 | 43.17 | 76.14 | 78.24 | 68.98 | 55.37 | 64.38 | 56.39 |
| HIGGS (p=2) | 3.25 | 7.110 | 44.11 | 76.35 | 77.09 | 73.09 | 55.77 | 65.28 | 57.56 |
| HIGGS (p=3) | 3.25 | 6.807 | 44.71 | 77.95 | 77.75 | 71.11 | 57.01 | 65.71 | 60.11 |
| HIGGS (p=4) | 3.25 | 6.643 | 47.27 | 78.41 | 78.45 | 70.72 | 56.97 | 66.36 | 59.88 |
| GPTQ | 3.25 | 7.133 | 41.13 | 72.81 | 75.14 | 71.51 | 53.86 | 62.89 | 58.37 |
| HIGGS (dyn data-free) | 3.25 | 6.388 | 47.10 | 79.12 | 78.78 | 71.59 | 57.09 | 66.74 | 61.62 |
| AF | 4.02 | 6.194 | 46.84 | 78.54 | 79.16 | 73.95 | 58.28 | 67.35 | 61.47 |
| NF | 4.02 | 6.225 | 47.95 | 79.38 | 79.27 | 73.24 | 58.44 | 67.66 | 62.65 |
| HQQ | 4.02 | 8.057 | 46.84 | 78.16 | 77.91 | 70.17 | 55.44 | 65.70 | 57.72 |
| HIGGS (p=1) | 4.02 | 6.142 | 47.27 | 79.63 | 78.78 | 72.45 | 58.29 | 67.28 | 61.74 |
| HIGGS (p=2) | 4.02 | 6.015 | 48.29 | 81.06 | 79.54 | 73.95 | 58.54 | 68.28 | 63.26 |
| HIGGS (p=3) | 4.02 | 5.981 | 50.17 | 80.26 | 80.30 | 73.72 | 59.17 | 68.73 | 62.83 |
| GPTQ | 4.02 | 6.238 | 45.82 | 78.66 | 78.02 | 72.53 | 56.91 | 66.39 | 62.96 |
| HIGGS (dyn data-free) | 4.00 | 5.910 | 49.23 | 80.98 | 79.38 | 72.85 | 59.00 | 68.29 | 63.86 |
| AF | 4.25 | 5.952 | 49.57 | 80.85 | 79.27 | 74.27 | 59.13 | 68.62 | 63.20 |
| NF | 4.25 | 5.964 | 49.32 | 80.81 | 78.94 | 73.40 | 59.16 | 68.33 | 64.10 |
| HQQ | 4.25 | 5.944 | 50.09 | 81.44 | 79.76 | 73.88 | 59.44 | 68.92 | 63.70 |
| HIGGS (p=1) | 4.26 | 5.978 | 50.26 | 80.98 | 79.54 | 73.24 | 58.96 | 68.60 | 63.47 |
| HIGGS (p=2) | 4.26 | 5.908 | 50.60 | 81.48 | 79.38 | 74.19 | 59.17 | 68.96 | 63.52 |
| HIGGS (p=3) | 4.25 | 5.872 | 49.57 | 81.27 | 79.38 | 72.38 | 59.33 | 68.39 | 64.24 |
| GPTQ | 4.25 | 5.923 | 47.18 | 79.59 | 79.16 | 72.22 | 58.43 | 67.32 | 64.06 |
| HIGGS (dyn data-free) | 4.25 | 5.831 | 50.43 | 81.27 | 79.43 | 72.85 | 59.33 | 68.66 | 64.06 |