notesum.ai

Published at November 6

Crystal: Illuminating LLM Abilities on Language and Code

cs.SE

cs.AI

cs.CL

Released Date: November 6, 2024

Authors: Tianhua Tao¹, Junbo Li², Bowen Tan³, Hongyi Wang³, William Marshall⁴, Bhargav M Kanakiya, Joel Hestness⁴, Natalia Vassilieva⁴, Zhiqiang Shen², Eric P. Xing², Zhengzhong Liu²

Aff.: ¹University of Illinois Urbana-Champaign, Champaign, Illinois, United States; ²Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; ³Carnegie Mellon University, Pittsburgh, Pennsylvania, United States; ⁴Cerebras Systems, Sunnyvale, California, United States

Arxiv: http://arxiv.org/abs/2411.04156v1

Natural Language Benchmarks
	Crystal			Other Open Source Models
	Phase 1	Phase 2	Adapt.	Llama 2	Code Llama	OLMo	StarCoder ${}_{\text{15.5B}}$
ARC-easy (0-shot)	64.73	70.75	67.34	74.50	62.29	68.51	50.17
ARC-challenge (0-shot)	37.54	42.58	38.91	46.16	35.24	40.27	27.73
ARC-challenge (25-shot)	42.83	47.44	47.01	53.33	42.75	45.93	32.16
Openbook QA (0-shot)	39.60	41.20	39.80	44.20	36.80	42.60	32.20
TruthfulQA (5-shot)	38.96	36.47	35.91	38.95	37.19	35.92	41.36
MMLU (0-shot)	28.05	42.46	42.33	41.71	34.76	28.19	27.55
MMLU (5-shot)	25.72	48.42	48.78	46.40	39.98	28.12	28.45
HellaSwag (0-shot)	69.65	72.89	70.35	75.92	62.80	75.56	46.65
HellaSwag (10-shot)	71.62	74.38	71.97	78.5	64.74	77.12	48.36
RACE (0-shot)	38.57	38.18	38.18	39.52	39.52	38.37	31.67
PIQA (0-shot)	75.84	78.07	76.77	78.78	72.58	79.92	65.61
COPA (0-shot)	86.00	83.00	80.00	87.00	80.00	88.00	67.00
BoolQ (0-shot)	66.64	74.43	72.36	78.07	74.65	72.66	57.16
Winogrande (0-shot)	63.14	67.01	65.51	69.38	65.51	67.24	55.10
Winogrande (5-shot)	64.80	68.82	67.40	73.64	65.75	68.90	56.04
GSM8K (5-shot)	2.12	12.36	10.39	14.71	11.15	4.09	9.02
Code Benchmarks
HumanEval (pass@1)	7.44	23.90	28.38	13.05	30.06	14.02	33.63
HumanEval (pass@10)	14.64	45.12	52.76	22.61	58.36	24.56	59.38
MBPP (pass@1)	8.92	30.99	36.37	20.09	39.20	14.40	52.7
MBPP (pass@10)	17.24	58.62	56.37	34.69	64.00	26.42	65.44
Multipl-e Bash (pass@1)	0	10.76	6.96	2.53	10.13	1.26	10.12
Multipl-e C++ (pass@1)	6.83	24.22	23.60	6.83	26.08	13.04	29.81
Multipl-e C# (pass@1)	3.17	17.09	17.09	6.32	23.41	8.86	20.88
Multipl-e Java (pass@1)	3.17	22.79	27.21	11.39	33.54	13.29	29.74
Multipl-e JS (pass@1)	9.94	29.19	29.81	12.42	35.40	14.91	31.05
Multipl-e PHP (pass@1)	4.97	20.497	20.50	9.94	24.22	7.45	27.32
Multipl-e TS (pass@1)	10.06	25.15	30.18	13.21	32.70	12.57	33.96