I can not believe, that they really did this: Dolma's groundbreaking 3 trillion tokens – paving the way for innovation and open-access progress.
For free - for science under OpenSource License - that is unbelievable. Guys - what do you think??! That´s a milestone for data science?!
https://kinews24.de/dolma-worlds-largest-free-dataset-with-3-trillion-tokens-for-llm-training-released
[link] [comments]