<span class="vcard">/u/__Tenacious___</span>
/u/__Tenacious___

Nobody is Doing AI Benchmarking Right

The ways we measure LLMs' abilities, and thereby predict their impact, are seriously flawed. Basically all AI benchmarks have serious shortcomings: https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarking-right submi…