Please correct my understanding of "memory" in LLMs
I'm trying to understand how GPTs/LLMs work, on a conceptual level and using the correct terminology. Here's my understanding so far (please correct if I'm wrong): GPTs are pre-trained so that for any given input it spits out the statistic…