New architecture scaling
The new Alibaba QwQ 32B is exceptional for its size and is pretty much SOTA in terms of benchmarks, we had deepseek r1 lite a few days ago which should be 15B parameters if it's like the last DeepSeek Lite. It got me thinking what would happen if w…