Hey guys,
So my senior just discovered Local-LLMs and he is obsessed with setting up a local LLM to answer questions about personal documents sourced from diffrent platforms, DBs, PDFS, URLs etc. His idea is to pitch this to some client. From what I have been able to set up, gpt4all windows version (does not use GPU), GPT4All code version (Also not sure if it can use GPU) and private GPT, The time it takes for the LLM to answer questions and the accuracy both are not what would make a commerical product. Time is always > 30 seconds. Answers are also here and there, even on VMs that cost 600$ monthly to run.
Now, there are new models being released every second, it seems. Yesterday I spent whole day trying to load the newest one MBT-30B on a p3 AWS EC-2 With Tesla v-100 16GB GPU. The GPU ran out of memory when loading it, the model itself is 30GB. whole day wasted.
This has become sort of a wild goose chase and I have the feeling this is a waste of time, or there is something very basic I am probably not understanding? What do you guys suggest?
[link] [comments]