/u/TripleBogeyBandit

Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.

/u/TripleBogeyBandit December 10, 2025 December 10, 2025

There are multiple benchmarks that probe the frontier of agent capabilities (GDPval, Humanity's Last Exam (HLE), ARC-AGI-2), but we do not find them representative of the kinds of tasks that are important to our customers. To fill this gap, we'…

Share this: