HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI Launches GeneBench-Pro to Test AI Scientific Reasoning

OpenAI Blog •
×

OpenAI has introduced GeneBench-Pro, a new benchmark designed to test how well AI models handle complex computational biology tasks. Unlike previous evaluations that focus on simple fact retrieval, this benchmark requires models to demonstrate research-level judgment. It measures a model's ability to navigate messy datasets, revise assumptions, and make decisions similar to a human scientist.

Researchers built the benchmark using synthetic data to ensure precise grading against known targets. This approach prevents models from exploiting shortcuts or finding unintended solution pathways. The collection includes 129 problems spanning genomics and quantitative biology. To maintain high standards, the team utilized external domain experts to audit the realism and technical accuracy of the questions.

Initial results show a massive gap between current capabilities and human expertise. While the top-performing GPT-5.6 Sol achieved a 28.7% pass rate, human experts typically spend 20 to 40 hours completing these tasks. This performance gap highlights the difficulty of closing the inferential loop in high-stakes scientific research. The benchmark remains a rigorous test for the next generation of reasoning models.