The Gemini release was really interesting in that they sort of buried the lede by not mentioning the 99% accuracy of the context window.
The 128k context window of OpenAI will fall down pretty quickly and really is only 32k-64k if you care about your context actually being used.
Ideally you would just fit all your data into the 10M token context window but that's going to be about $5 as per my understanding.
That's going to get expensive quickly for a lot of applications.
The questions is how long will this be the case. If RAG is only about cost savings I can see it starting to fade away in use over the next 1-2 years and most people just wanting to push everything into the context window.
[link] [comments]