notesum.ai

Published at October 24

From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

cs.LG
cs.AI
cs.NE

Released Date: October 24, 2024

Authors: Artur Kiulian1, Anton Polishko1, Mykola Khandoga1, Yevhen Kostiuk2, Guillermo Gabrielli1, Łukasz Gagała, Fadi Zaraket3, Qusai Abu Obaida4, Hrishikesh Garud5, Wendy Wing Yee Mak6, Dmytro Chaplynskyi1, Selma Belhadj Amor6, Grigol Peradze6

Aff.: 1OpenBabylon; 2OpenBabylon, ARG-Tech, University of Dundee, UK; 3Doha Institute for Graduate Studies; 4Arab Center for Research and Policy Studies; 5Google; 6PolyAgent

Arxiv: https://arxiv.org/abs/2410.18836v1