How long until structured data isn’t required for training and fine-tuning?
How long until structured data isn’t required for training and fine-tuning?

How long until structured data isn’t required for training and fine-tuning?

Many services offer "we'll index your PDF, DOCX, etc." files, but we all (should) know that data like this is over-inflated with tons of extraneous data that's not needed and takes longer to parse.

At what point do you think we'll start to see a negligible performance (accuracy) difference between structured and unstructured data?

I understand for some specific models, structured data will always be necessary, but what about for common LLMs?

submitted by /u/avguru1
[link] [comments]