notesum.ai
Published at November 12Training Data for Large Language Model
cs.AI
Released Date: November 12, 2024
Authors: Yiming Ju, Huanhuan Ma
| 数据集名称 | 样本数量 | 许可证 |
|---|---|---|
| InstructionWild_v2 | 110K | MIT |
| LCCC | 12M | MIT |
| Chatbot_arena_conversations | 33K | CC-BY-NC-4.0 |
| Zhihu-KOL | 1M | MIT |
| Chinese-Medical-DIALOGUE-Data | 792K | MIT |
| OpenChat | 70K | MIT |
| ShareGPT-Chinese-English-90k | 90K | Apache-2.0 |
| HuatuoGPT-SFT-data-v1 | 226K | Apache-2.0 |
| Huatuo-26M | 26.5M | Apache-2.0 |
| LMSYS-Chat-1M | 1M | LMSYS-Chat-1M license |
| WildChat | 1.04M | AI2 ImpACT license |
| MMIQC | 2.3M | Apache-2.0 |