Benchmarks would be better if you always included how humans scored in comparison. Both the median human and an expert human
People often include comparisons to different models, but why not include humans too? submitted by /u/katxwoods [link] [comments]