When there is no answer key for scientific discovery how do we verify an ai hypothesis
I have been thinking a lot about the actual limits of AI-driven scientific discovery, specifically how we evaluate models when they are proposing genuinely new hypotheses where no "answer key" exists. When we test LLMs on standard benchmarks,…