<span class="vcard">/u/AcanthocephalaNo8273</span>
/u/AcanthocephalaNo8273

Why are Diffusion-Encoder LLMs not more popular?

Autoregressive inference will always have a non-zero chance of hallucination. It’s baked into the probabilistic framework, and we probably waste a decent chunk of parameter space just trying to minimise it. Decoder-style LLMs have an inherent trade-off…