<span class="vcard">/u/user0069420</span>
/u/user0069420

O3 beats 99.8% competitive coders

So apparently the equivalent percentile of a 2727 elo rating is 99.8 on codeforces Source: https://codeforces.com/blog/entry/126802 submitted by /u/user0069420 [link] [comments]

o1 LiveBench coding results

Note: Note: o1 was evaluated manually using ChatGPT. So far, it has only been scored on coding tasks. https://livebench.ai/#/ submitted by /u/user0069420 [link] [comments]

New architecture scaling

The new Alibaba QwQ 32B is exceptional for its size and is pretty much SOTA in terms of benchmarks, we had deepseek r1 lite a few days ago which should be 15B parameters if it's like the last DeepSeek Lite. It got me thinking what would happen if w…

Hallucinations in LLMs

I think Hallucinations in LLMs are what we call when we don't like the output, and creativity is what we call when we do like it, since they really think what they are responding is correct based on their training data and the context provided. Wha…