notesum.ai
Published at November 7PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training
cs.CL
cs.AI
cs.LG
Released Date: November 7, 2024
Authors: Rongjie Yi1, Xiang Li1, Weikai Xie1, Zhenyan Lu1, Chenghua Wang1, Ao Zhou1, Shangguang Wang1, Xiwen Zhang2, Mengwei Xu1
Aff.: 1Institution 1; 2Institution 2

| type | dataset | token |
| math instruct | MathInstruct (Yue et al., 2023) | 65.25M |
| chat instruct | UltraChat (Ding et al., 2023) | 1.775B |
| OpenAssistant 2 (Köpf et al., 2024) | 42.25M | |
| OpenHermes (Teknium, 2023) | 77.25M | |
| code instruct | Magicoder Evol Instruct (ise uiuc, 2024) | 30.25M |
| CommitPackFT (Muennighoff et al., 2023) | 0.35B | |
| Magicoder OSS Instruct (Wei et al., 2023) | 43.5M | |
| SlimOrca (Lian et al., 2023) | 209.75M | |
| total | 2.59B | |