notesum.ai
Published at November 17LLäMmlein: Compact and Competitive German-Only Language Models from Scratch
cs.CL
cs.AI
Released Date: November 17, 2024
Authors: Jan Pfister1, Julia Wunderle1, Andreas Hotho1
Aff.: 1Julius-Maximilians-Universität Würzburg (JMU)
![[Uncaptioned image]](https://arxiv.org/html/2411.11171v1/extracted/6005823/pics/sheep.png)
| Tokenizer | Token Count |
|---|---|
| word count | 46,509,357 |
| german-gpt2 | 78,151,205 |
| gbert-large | 79,969,101 |
| ours 1TB | 105,481,995 |
| ours 2023-2021 | 96,459,503 |
| ours 2023_14 | 81,993,239 |