/u/user0069420

O3 beats 99.8% competitive coders

/u/user0069420 December 20, 2024 December 20, 2024

So apparently the equivalent percentile of a 2727 elo rating is 99.8 on codeforces Source: https://codeforces.com/blog/entry/126802 submitted by /u/user0069420 [link] [comments]

artificial

o1 LiveBench coding results

/u/user0069420 December 10, 2024 December 10, 2024

Note: Note: o1 was evaluated manually using ChatGPT. So far, it has only been scored on coding tasks. https://livebench.ai/#/ submitted by /u/user0069420 [link] [comments]

artificial

New architecture scaling

/u/user0069420 November 28, 2024 November 28, 2024

The new Alibaba QwQ 32B is exceptional for its size and is pretty much SOTA in terms of benchmarks, we had deepseek r1 lite a few days ago which should be 15B parameters if it's like the last DeepSeek Lite. It got me thinking what would happen if w…

artificial

Hallucinations in LLMs

/u/user0069420 October 20, 2024 October 20, 2024

I think Hallucinations in LLMs are what we call when we don't like the output, and creativity is what we call when we do like it, since they really think what they are responding is correct based on their training data and the context provided. Wha…

Share this:

Share this:

Share this:

Share this: