Nobody is Doing AI Benchmarking Right
The ways we measure LLMs' abilities, and thereby predict their impact, are seriously flawed. Basically all AI benchmarks have serious shortcomings: https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarking-right submi…