Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.
There are multiple benchmarks that probe the frontier of agent capabilities (GDPval, Humanity's Last Exam (HLE), ARC-AGI-2), but we do not find them representative of the kinds of tasks that are important to our customers. To fill this gap, we'…