/u/SpecialistBuffalo580

Is Humanity’s Last Exam a benchmark that measures real intelligence for AGI?

/u/SpecialistBuffalo580 November 30, 2025 November 30, 2025

With Grok 4 and Gemini 3 the models have become really good at the known benchmarks like ARC-AGI and HLE. But is it really a proof of intelligence? Does acing these benchmarks truly show capabilities for original research and real understanding? I ask …

Share this: