Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.

There are multiple benchmarks that probe the frontier of agent capabilities (GDPval, Humanity's Last Exam (HLE), ARC-AGI-2), but we do not find them representative of the kinds of tasks that are important to our customers. To fill this gap, we've created and are open-sourcing OfficeQA—a benchmark that proxies for economically valuable tasks performed by Databricks' enterprise customers. We focus on a very common yet challenging enterprise task: Grounded Reasoning, which involves answering questions based on complex proprietary datasets that include unstructured documents and tabular data.

https://www.databricks.com/blog/introducing-officeqa-benchmark-end-to-end-grounded-reasoning

submitted by /u/TripleBogeyBandit
[link] [comments]