notesum.ai
Published at November 18Bi-Mamba: Towards Accurate 1-Bit State Space Models
cs.CL
cs.AI
Released Date: November 18, 2024
Authors: Shengkun Tang1, Liqun Ma1, Haonan Li1, Mingjie Sun2, Zhiqiang Shen1
Aff.: 1Mohamed bin Zayed University of Artificial Intelligence; 2Carnegie Mellon University

| Method | Model | Size | Zero-shot Accuracy | Perplexity | |||||||||
| BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | Avg. | Wiki2 | PTB | C4 | |||
| Mamba-2 Dao and Gu (2024) | M | 780M | 61.5 | 71.8 | 54.9 | 60.2 | 54.3 | 28.5 | 36.2 | 52.5 | 11.8 | 20.0 | 16.5 |
| GPTQ-3bit | M | 780M | 44.6 | 62.9 | 40.3 | 53.3 | 40.6 | 26.4 | 30.6 | 42.6 | 152.5 | 192.5 | 186.0 |
| GPTQ-2bit | M | 780M | 40.4 | 52.3 | 25.7 | 51.3 | 25.6 | 25.1 | 30.2 | 35.2 | 1.6e+8 | 1.3e+8 | 7.3e+7 |
| BiLLM | M | 780M | 54.1 | 52.9 | 26.9 | 50.6 | 28.5 | 26.5 | 27.2 | 38.1 | 1.8e+4 | 2.4e+4 | 1.5e+4 |
| Bi-Mamba | M | 780M | 59.1 | 66.4 | 39.6 | 52.2 | 41.2 | 22.8 | 30.0 | 44.5 | 14.2 | 34.4 | 15.0 |
| TinyLLaMA Zhang et al. (2024) | T | 1.3B | 57.8 | 73.3 | 59.2 | 59.1 | 55.3 | 30.1 | 36.0 | 53.0 | 7.8 | 30.5 | 9.9 |
| OPT Zhang et al. (2022) | T | 1.3B | 57.8 | 72.5 | 53.7 | 59.5 | 51.0 | 29.5 | 33.4 | 51.1 | 14.6 | 20.3 | 16.1 |
| Mamba-2 Dao and Gu (2024) | M | 1.3B | 64.3 | 73.7 | 59.9 | 61.0 | 60.4 | 33.1 | 37.8 | 55.8 | 10.4 | 17.7 | 14.8 |
| GPTQ-3bit | M | 1.3B | 56.8 | 68.2 | 48.5 | 54.4 | 48.0 | 28.8 | 30.4 | 47.8 | 29.3 | 56.5 | 37.3 |
| GPTQ-2bit | M | 1.3B | 42.0 | 49.9 | 25.7 | 49.6 | 26.4 | 26.1 | 27.6 | 35.3 | 1.2e+6 | 1.0e+6 | 1.3e+6 |
| BiLLM | M | 1.3B | 40.1 | 55.4 | 29.6 | 50.7 | 30.6 | 21.8 | 25.4 | 36.2 | 4943.2 | 3540.8 | 4013.6 |
| Bi-Mamba | M | 1.3B | 62.0 | 69.2 | 43.1 | 53.7 | 43.9 | 24.4 | 31.2 | 46.7 | 12.6 | 28.9 | 13.6 |
| Mamba-2 Dao and Gu (2024) | M | 2.7B | 70.7 | 76.3 | 66.6 | 63.9 | 64.8 | 36.3 | 38.8 | 59.6 | 9.1 | 15.3 | 13.3 |
| GPTQ-3bit | M | 2.7B | 54.8 | 69.9 | 54.0 | 56.0 | 51.6 | 33.3 | 32.8 | 50.3 | 21.2 | 39.0 | 29.3 |
| GPTQ-2bit | M | 2.7B | 45.4 | 49.8 | 25.8 | 52.0 | 25.8 | 25.8 | 26.0 | 35.8 | 2.1e+5 | 2.3e+5 | 1.8e+5 |
| BiLLM | M | 2.7B | 52.8 | 53.8 | 27.7 | 53.0 | 29.1 | 25.1 | 28.2 | 38.5 | 8707.0 | 1.7e+4 | 1.3e+4 |
| Bi-Mamba | M | 2.7B | 60.2 | 71.2 | 51.4 | 52.8 | 49.1 | 27.0 | 33.6 | 49.3 | 10.7 | 24.2 | 11.9 |