<span class="vcard">/u/LahmacunBear</span>
/u/LahmacunBear

Unifying Probabilistic Learning in Transformers

NEW PAPER: Unifying Probabilistic Learning in Transformers What if attention, diffusion, reasoning and training were all the same thing? Our paper proposes a novel, unified way of understanding AI — and it looks a lot like quantum mechanics. Intellig…

Cheaper, Faster, Better Transformers. ELiTA: Linear-Time Attention Done Right

Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on…