Training AI Co-Scientists using Rubric Rewards

Research released today by Meta: A general, scalable recipe to train AI to assist scientists in achieving their open-ended research goals:

Extract research goals and goal-specific grading rubrics from the large corpus of existing scientific papers with an LLM, and use them for RL training.
Reward plans generated during training with self-grading by the initial model, which is provided the rubrics to create a generator-verifier gap.

Finetuning Qwen3-30B with self-grading leads to improved research plans according to human experts for 70% research goals in Machine Learning. The 30B model matches Grok-4-Thinking, though GPT-5-Thinking is a cut above the rest.

OpenAI models really capable of accelerating science! The paper also shows significant cross-domain generalization as evidence for the vision of generalist AI co-scientists.

submitted by /u/logisbase2
[link] [comments]