/u/__Tenacious___

Nobody is Doing AI Benchmarking Right

/u/__Tenacious___ July 8, 2025 July 8, 2025

The ways we measure LLMs' abilities, and thereby predict their impact, are seriously flawed. Basically all AI benchmarks have serious shortcomings: https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarking-right submi…

Share this: