notesum.ai
Published at November 6Crystal: Illuminating LLM Abilities on Language and Code
cs.SE
cs.AI
cs.CL
Released Date: November 6, 2024
Authors: Tianhua Tao1, Junbo Li2, Bowen Tan3, Hongyi Wang3, William Marshall4, Bhargav M Kanakiya, Joel Hestness4, Natalia Vassilieva4, Zhiqiang Shen2, Eric P. Xing2, Zhengzhong Liu2
Aff.: 1University of Illinois Urbana-Champaign, Champaign, Illinois, United States; 2Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates; 3Carnegie Mellon University, Pittsburgh, Pennsylvania, United States; 4Cerebras Systems, Sunnyvale, California, United States

| Crystal | Other Open Source Models | ||||||
| Phase 1 | Phase 2 | Adapt. | Llama 2 | Code Llama | OLMo | StarCoder | |
| Natural Language Benchmarks | |||||||
| ARC-easy (0-shot) | 64.73 | 70.75 | 67.34 | 74.50 | 62.29 | 68.51 | 50.17 |
| ARC-challenge (0-shot) | 37.54 | 42.58 | 38.91 | 46.16 | 35.24 | 40.27 | 27.73 |
| ARC-challenge (25-shot) | 42.83 | 47.44 | 47.01 | 53.33 | 42.75 | 45.93 | 32.16 |
| Openbook QA (0-shot) | 39.60 | 41.20 | 39.80 | 44.20 | 36.80 | 42.60 | 32.20 |
| TruthfulQA (5-shot) | 38.96 | 36.47 | 35.91 | 38.95 | 37.19 | 35.92 | 41.36 |
| MMLU (0-shot) | 28.05 | 42.46 | 42.33 | 41.71 | 34.76 | 28.19 | 27.55 |
| MMLU (5-shot) | 25.72 | 48.42 | 48.78 | 46.40 | 39.98 | 28.12 | 28.45 |
| HellaSwag (0-shot) | 69.65 | 72.89 | 70.35 | 75.92 | 62.80 | 75.56 | 46.65 |
| HellaSwag (10-shot) | 71.62 | 74.38 | 71.97 | 78.5 | 64.74 | 77.12 | 48.36 |
| RACE (0-shot) | 38.57 | 38.18 | 38.18 | 39.52 | 39.52 | 38.37 | 31.67 |
| PIQA (0-shot) | 75.84 | 78.07 | 76.77 | 78.78 | 72.58 | 79.92 | 65.61 |
| COPA (0-shot) | 86.00 | 83.00 | 80.00 | 87.00 | 80.00 | 88.00 | 67.00 |
| BoolQ (0-shot) | 66.64 | 74.43 | 72.36 | 78.07 | 74.65 | 72.66 | 57.16 |
| Winogrande (0-shot) | 63.14 | 67.01 | 65.51 | 69.38 | 65.51 | 67.24 | 55.10 |
| Winogrande (5-shot) | 64.80 | 68.82 | 67.40 | 73.64 | 65.75 | 68.90 | 56.04 |
| GSM8K (5-shot) | 2.12 | 12.36 | 10.39 | 14.71 | 11.15 | 4.09 | 9.02 |
| Code Benchmarks | |||||||
| HumanEval (pass@1) | 7.44 | 23.90 | 28.38 | 13.05 | 30.06 | 14.02 | 33.63 |
| HumanEval (pass@10) | 14.64 | 45.12 | 52.76 | 22.61 | 58.36 | 24.56 | 59.38 |
| MBPP (pass@1) | 8.92 | 30.99 | 36.37 | 20.09 | 39.20 | 14.40 | 52.7 |
| MBPP (pass@10) | 17.24 | 58.62 | 56.37 | 34.69 | 64.00 | 26.42 | 65.44 |
| Multipl-e Bash (pass@1) | 0 | 10.76 | 6.96 | 2.53 | 10.13 | 1.26 | 10.12 |
| Multipl-e C++ (pass@1) | 6.83 | 24.22 | 23.60 | 6.83 | 26.08 | 13.04 | 29.81 |
| Multipl-e C# (pass@1) | 3.17 | 17.09 | 17.09 | 6.32 | 23.41 | 8.86 | 20.88 |
| Multipl-e Java (pass@1) | 3.17 | 22.79 | 27.21 | 11.39 | 33.54 | 13.29 | 29.74 |
| Multipl-e JS (pass@1) | 9.94 | 29.19 | 29.81 | 12.42 | 35.40 | 14.91 | 31.05 |
| Multipl-e PHP (pass@1) | 4.97 | 20.497 | 20.50 | 9.94 | 24.22 | 7.45 | 27.32 |
| Multipl-e TS (pass@1) | 10.06 | 25.15 | 30.18 | 13.21 | 32.70 | 12.57 | 33.96 |