notesum.ai
Published at November 21Why do language models perform worse for morphologically complex languages?
cs.CL
Released Date: November 21, 2024
Authors: Catherine Arnett1, Benjamin K. Bergen
Aff.: 1Department of Linguistics, University of California San Diego
| Language | ISO | Writing Sys. | Lang. Family | Morph. Type | Num. Items |
|---|---|---|---|---|---|
| 639-3 | (ISO 15924) | ||||
| Armenian | hye | armn | Indo-European | agglutinative | 2000 |
| Basque | eus | latn | Basque | agglutinative | 2000 |
| Bulgarian | bul | cyrl | Indo-European | fusional | 2000 |
| Cebuano | ceb | latn | Austronesian | agglutinative | 131 |
| English | eng | latn | Indo-European | fusional | 2000 |
| Georgian | kat | geor | Kartvelian | agglutinative | 200 |
| Greek | ell | grek | Indo-European | fusional | 112 |
| Gujarati | guj | gujr | Indo-European | fusional | 547 |
| Hungarian | hun | latn | Uralic | agglutinative | 2000 |
| Icelandic | isl | latn | Indo-European | fusional | 1852 |
| Indonesian | ind | latn | Austronesian | agglutinative | 1552 |
| Irish | gle | latn | Indo-European | fusional | 1877 |
| Japanese | jpn | jpan | Japonic | agglutinative | 2000 |
| Korean | kor | hang | Koreanic | agglutinative | 2000 |
| Northern Kurdish | kmr | latn | Indo-European | fusional | 319 |
| Persian | pes | arab | Indo-European | fusional | 2000 |
| Slovenian | slv | latn | Indo-European | fusional | 2000 |
| Spanish | spa | latn | Indo-European | fusional | 2000 |
| Tamil | tam | taml | Dravidian | agglutinative | 884 |
| Turkish | tur | latn | Turkic | agglutinative | 2000 |
| Urdu | urd | arab | Indo-European | fusional | 1649 |
| Zulu | zul | latn | Niger-Congo | agglutinative | 2000 |