Question: Do LLMs memorize their state during multiple autoregressive iterations?
I try to understand GPT-3/4 conceptually. Not enough coding knowledge yet to understand it from code. Simple question: I know that GPT outputs one token (distribution) at a time and is the fed the result, thus giving the next token and so on. But is e…