<span class="vcard">/u/Spielverderber23</span>
/u/Spielverderber23

Question: Do LLMs memorize their state during multiple autoregressive iterations?

I try to understand GPT-3/4 conceptually. Not enough coding knowledge yet to understand it from code. Simple question: I know that GPT outputs one token (distribution) at a time and is the fed the result, thus giving the next token and so on. But is e…