notesum.ai
Published at December 6BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits
cs.CL
cs.AI
cs.NE
Released Date: December 6, 2024
Authors: Wazib Ansar1, Saptarsi Goswami2, Amlan Chakrabarti1
Aff.: 1University of Calcutta; 2Bangabasi Morning College

| Model | #Bits (W-A) | Size (MB) | SST-2 | CoLA | MRPC | RTE |
| Quantized Models without EE | ||||||
| Q-BERT [5] | 8-8 | 43 | 84.6 | - | 68.3 | 52.7 |
| TernaryBERT [29] | 2-8 | 28 | - | 50.7 | 87.5 | 68.2 |
| BinaryBERT [17] | 1-8 | 16.5 | 92.6 | 53.4 | 85.5 | 72.2 |
| BinaryBERT [17] | 1-4 | 16.5 | 92.3 | 44.4 | 83.3 | 65.3 |
| BinaryBERT [17] | 1-2 | 16.5 | 82.5 | 14.6 | 68.3 | 52.7 |
| BinaryBERT [17] | 1-1 | 16.5 | 53.2 | 0 | 68.3 | 52.7 |
| BiBERT [18] | 1-1 | 13.4 | 88.7 | 25.4 | 72.5 | 57.4 |
| BiT (WMS) [19] | 1-4 | 13.4 | 91.5 | 42.0 | 86.8 | 66.4 |
| BiT (WMS) [19] | 1-2 | 13.4 | 90.8 | 32.1 | 78.4 | 58.1 |
| BiT (WMS) [19] | 1-1 | 13.4 | 87.7 | 25.1 | 79.7 | 58.8 |
| BiT [19] | 1-1 | 13.4 | 89.9 | 32.9 | 79.9 | 62.1 |
| Full-precision EE Models | ||||||
| ElasticBERTBASE [23] | 32-32 | 416 | - | - | 87.9 | - |
| ElasticBERTBASE-6L [23] | 32-32 | 256 | - | - | 87.3 | - |
| BE3R_BERT [13] | 32-32 | 1297 | 93.23 | 59.81 | 90.27 | 71.48 |
| BE3R_Electra [13] | 32-32 | 1278 | 96.79 | 68.26 | 91.47 | 86.28 |
| PABEE [22] | 32-32 | 46 | 93.0 | 61.2 | 90.0 | 80.1 |
| Proposed Model and Ablations | ||||||
| BEExformer (WEE) | 1-1 | 11.05 | 85.77 | 53.21 | 82.30 | 64.98 |
| BEExformer (WEE-FP) | 32-32 | 102.41 | 90.71 | 57.43 | 84.27 | 67.15 |
| BEExformer (FP) | 32-32 | 263 | 95.10 | 62.70 | 90.92 | 75.81 |
| BEExformer (Proposed) | 1-1 | 14.26 | 92.32 | 60.30 | 86.47 | 71.11 |
|
Note- WMS: Without multi-distillation; WEE: Without the proposed EE ; WEE-FP: Full-Precision without EE ; FP: Full-Precision. ’-’ indicates value is not available.
The default metrics for each data-set have been used for the results. indicates higher value is desired while indicates lower value is desired. The bold values indicate the best results across each category. |
||||||