<span class="vcard">/u/TripleBogeyBandit</span>
/u/TripleBogeyBandit

Databricks releases OfficeQA, an ai benchmark for Grounded Reasoning.

There are multiple benchmarks that probe the frontier of agent capabilities (GDPval, Humanity's Last Exam (HLE), ARC-AGI-2), but we do not find them representative of the kinds of tasks that are important to our customers. To fill this gap, we'…