notesum.ai

Published at November 25

FineWeb-zhtw: Scalable Curation of Traditional Chinese Text Data from the Web

cs.CL
cs.DB

Released Date: November 25, 2024

Authors: Cheng-Wei Lin1, Wan-Hsuan Hsieh1, Kai-Xin Guan1, Chan-Jan Hsu1, Chia-Chen Kuo2, Chuan-Lin Lai2, Chung-Wei Chung2, Ming-Jen Wang2, Da-Shan Shiu1

Aff.: 1MediaTek Research; 2National Applied Research Laboratories

Arxiv: http://arxiv.org/abs/2411.16387v1