notesum.ai
Published at November 27Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
cs.CL
Released Date: November 27, 2024
Authors: Lewen Yang1, Xuanyu Zhou1, Juao Fan1, Xinyi Xie1, Shengxin Zhu1
Aff.: 1Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, China

| Model | Params | GLUE Avg. | SQuAD 1.1 F1/EM | SQuAD 2.0 F1/EM |
| BERT (Devlin et al., 2018)[1] | 340M(334M) | 82.1 | 93.2/87.4 | 83.1/80.0 |
| XLNet (Yang et al., 2019)[2] | 355M | 90.5 | *95.1/*89.7 | 90.7/87.9 |
| RoBERTa (Liu et al., 2019)[3] | 355M | 88.5 | *94.6/*88.9 | 89.8/86.8 |
| StructBERT (Wang et al., 2019)[4] | 340M | 83.9 | *85.2/*92.0 | - |
| ALBERT (Lan et al., 2019)[5] | 235M | 89.4 | 95.5/90.1 | 91.4/88.9 |
| DistilBERT (SanH et al., 2020)[6] | 66M | 77 | *86.9/*79.1 | - |
| BART (Lewis et al., 2020)[7] | 374M | - | 94.6/88.8 | 89.2/86.1 |
| ELECTRA (Clark et al., 2020)[8] | 335M | 89.5 | *94.9/*89.7 | 91.4/88.7 |
| Funnel-Transformer (Dai et al., 2020)[9] | 488M | 89.7 | *94.7/*89.0 | *90.4/*87.6 |
| SpanBERT (Joshi et al., 2020)[10] | 340M | 82.8 | 94.6/88.8 | 88.7/85.7 |
| ConvBERT (Jiang et al., 2020)[11] | 106M | 86.4 | 90.0/84.7 | 83.1/80.6 |
| MPNet (Song et al., 2020)[12] | 110M | 86.5 | *92.7/*86.9 | 85.8/82.8 |
| LUKE (Yamada et al., 2020)[13] | 483M | - | 95.4/90.2 | - |
| UNILMv2 (Bao et al., 2020)[14] | 110M | 87.3 | 92.0/85.6 | 83.6/80.9 |
| DeBERTa (He et al., 2021)[15] | 433M | 90.0 | 95.5/90.1 | 90.7/88.0 |
| aThe dev test is annotated with “*” | ||||
| bNote: Missing results in literature are signified by “-” | ||||