notesum.ai

Published at December 4

AntLM: Bridging Causal and Masked Language Models

cs.CL

Released Date: December 4, 2024

Authors: Xinru Yu¹, Bin Guo¹, Shiwei Luo², Jie Wang¹, Tao Ji³, Yuanbin Wu¹

Aff.: ¹East China Normal University; ²Harbin Engineering University; ³Fudan University

Arxiv: http://arxiv.org/pdf/2412.03275v1

Refer to caption

Model	Data	BLiMP	$\text{BLiMP}_{\text{Supplement}}$	EWoK	GLUE	Macro Avg.
BabyLlama^†	10M	69.8	59.5	50.7	63.3	60.8
BabyLlama	10M	68.1	60.4	50.4	65.5	61.1
AntLM ${}_{\text{BabyLlama}}$	10M	69.4	60.7	51.1	67.4	62.1
BabyLlama^†	100M	73.1	60.6	52.1	69.0	63.7
LTG-BERT^†	100M	69.2	66.5	51.9	68.4	64.0
BabyLlama	100M	74.9	66.0	52.0	66.3	64.8
LTG-BERT^†	10M	60.6	60.8	48.9	60.3	57.5
LTG-BERT	10M	62.6	65.4	62.3	64.9	63.8
AntLM ${}_{\text{LTG-BERT}}$	10M	72.3	62.6	63.0	66.0	66.0