Cheaper, Faster, Better Transformers. ELiTA: Linear-Time Attention Done Right
Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on…