Cheaper & Faster & Smarter (TurboQuant and Attention Residuals)
Google TurboQuant This is a new compression algorithm. Every time a model answers a question, it stores a massive amount of intermediate data. The longer the conversation – the more expensive it gets. Result: compresses that data 6x+ with no quality lo…