notesum.ai

Published at December 3

Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset

cs.CL

Released Date: December 3, 2024

Authors: Dan Su1, Kezhi Kong, Ying Lin, Joseph Jennings, Brandon Norick, Markus Kliegl, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

Aff.: 1NVIDIA

Arxiv: http://arxiv.org/pdf/2412.02595v1