Can we measure the amount of written human knowledge with the size of trained LLMs?
Can we measure the amount of written human knowledge with the size of trained LLMs?

Can we measure the amount of written human knowledge with the size of trained LLMs?

GPT-4’s trained model (weights + structure) is said to be a few terabytes in size. Considering how much the model can do and how much data it’s been fed, is it fair to say that all written human knowledge — once compressed and generalized — fits into a few terabytes (give or take, but around the same magnitude)? Or is that not a good way to measure the size of “knowledge”?

Note: OpenAI has not publicly disclosed GPT-4’s exact size or dataset composition, but for reference GPT-3 was trained on roughly 45 TB of compressed raw text, filtered down to about 570 GB of clean, tokenized data and has 175 billion parameters.

submitted by /u/sky_surfing
[link] [comments]