Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.
Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.

Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.

There are multiple benchmarks that probe the frontier of agent capabilities (GDPval, Humanity's Last Exam (HLE), ARC-AGI-2), but we do not find them representative of the kinds of tasks that are important to our customers. To fill this gap, we've created and are open-sourcing OfficeQA—a benchmark that proxies for economically valuable tasks performed by Databricks' enterprise customers. We focus on a very common yet challenging enterprise task: Grounded Reasoning, which involves answering questions based on complex proprietary datasets that include unstructured documents and tabular data.

https://www.databricks.com/blog/introducing-officeqa-benchmark-end-to-end-grounded-reasoning

submitted by /u/TripleBogeyBandit
[link] [comments]