artificial

Can we measure the amount of written human knowledge with the size of trained LLMs?

October 23, 2025 October 23, 2025

/u/sky_surfing

artificial

Can we measure the amount of written human knowledge with the size of trained LLMs?

/u/sky_surfing

October 23, 2025 October 23, 2025

GPT-4’s trained model (weights + structure) is said to be a few terabytes in size. Considering how much the model can do and how much data it’s been fed, is it fair to say that all written human knowledge — once compressed and generalized — fits into a few terabytes (give or take, but around the same magnitude)? Or is that not a good way to measure the size of “knowledge”?

Note: OpenAI has not publicly disclosed GPT-4’s exact size or dataset composition, but for reference GPT-3 was trained on roughly 45 TB of compressed raw text, filtered down to about 570 GB of clean, tokenized data and has 175 billion parameters.

submitted by /u/sky_surfing
[link] [comments]