I’m wondering if the SaaS LLM offerings aren’t quite good enough yet for my use case. I need to extract about thirty key pieces of information from sets of PDF files programmatically.
Each file set will contain between 2 to 20 files and the data is fairly complex legal content. A reasonably intelligent person could do most of this work without having a legal background for example, identifying a court case number and the name of the plaintiff.
Some of the documents are several MB but most are smaller than 1 MB. Altogether I have about three thousand of these documents and will be collecting several hundred new ones every day.
Anyone doing something like this right now?
[link] [comments]