notesum.ai

Published at November 9

Zyda-2: a 5 Trillion Token High-Quality Dataset

cs.CL
cs.AI

Released Date: November 9, 2024

Authors: Yury Tokpanov1, Paolo Glorioso1, Quentin Anthony, Beren Millidge

Aff.: 1Zyphra, Palo Alto, CA

Arxiv: http://arxiv.org/abs/2411.06068v1