notesum.ai

Published at November 29

ChineseWebText 2.0: Large-Scale High-quality Chinese Web Text with Multi-dimensional and fine-grained information

cs.CL
cs.AI

Released Date: November 29, 2024

Authors: Wanyue Zhang1, Ziyong Li1, Wen Yang1, Chunlin Leng1, Yinan Bai1, Qianlong Du1, Chengqing Zong1, Jiajun Zhang2

Aff.: 1Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences; 2Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences; Wuhan AI Research

Arxiv: http://arxiv.org/pdf/2411.19668v1