notesum.ai
Published at December 4AntLM: Bridging Causal and Masked Language Models
cs.CL
Released Date: December 4, 2024
Authors: Xinru Yu1, Bin Guo1, Shiwei Luo2, Jie Wang1, Tao Ji3, Yuanbin Wu1
Aff.: 1East China Normal University; 2Harbin Engineering University; 3Fudan University

| Model | Data | BLiMP | EWoK | GLUE | Macro Avg. | |
| BabyLlama† | 10M | 69.8 | 59.5 | 50.7 | 63.3 | 60.8 |
| BabyLlama | 10M | 68.1 | 60.4 | 50.4 | 65.5 | 61.1 |
| AntLM | 10M | 69.4 | 60.7 | 51.1 | 67.4 | 62.1 |
| BabyLlama† | 100M | 73.1 | 60.6 | 52.1 | 69.0 | 63.7 |
| LTG-BERT† | 100M | 69.2 | 66.5 | 51.9 | 68.4 | 64.0 |
| BabyLlama | 100M | 74.9 | 66.0 | 52.0 | 66.3 | 64.8 |
| LTG-BERT† | 10M | 60.6 | 60.8 | 48.9 | 60.3 | 57.5 |
| LTG-BERT | 10M | 62.6 | 65.4 | 62.3 | 64.9 | 63.8 |
| AntLM | 10M | 72.3 | 62.6 | 63.0 | 66.0 | 66.0 |