A significant portion of the remaining training data for AI is located on magnetic tapes stored in warehouses.
A significant portion of the remaining training data for AI is located on magnetic tapes stored in warehouses.

A significant portion of the remaining training data for AI is located on magnetic tapes stored in warehouses.

I have been learning about the shortage of AI training data and one aspect that nobody considers is that much of the potential training data that can be used is not stored in any database system but rather on the old magnetic tapes that have been stored in climate controlled lockers for decades now. The 80s through the 2000s saw all major businesses, government offices, hospitals, television stations, and laboratories include backup of everything on tapes. Most of this data has neither been digitized nor indexed correctly.

With the advent of private LLM development, it turns out that the best datasets companies have are sitting on tapes in boxes.

Based on all the predictions that I have seen, the growth of internet based training data will quit at some point, roughly in 2026. The following training data could be derived from archiving older materials.

submitted by /u/BudgetLimit6364
[link] [comments]