HeadlinesBriefing favicon HeadlinesBriefing.com

Google Tests AI on Superconductivity Research: How LLMs Fare

Google AI Blog •
×

Google AI researchers partnered with Cornell University to evaluate how large language models (LLMs) handle complex questions in condensed matter physics, focusing on high-temperature superconductivity. The study, published in the *Proceedings of the National Academy of Sciences*, tested six LLMs—including GPT-4o, Claude 3.5, and NotebookLM—against expert-curated data. Results showed NotebookLM and a custom retrieval-augmented generation (RAG) system outperformed web-based models, which often mixed established theories with speculative claims.

The evaluation used 67 expert-designed questions, such as those about LSCO doping thresholds and quantum critical point evidence. NotebookLM excelled in providing balanced perspectives and evidence, while web-based models struggled with accuracy and contextual understanding. Curated databases of peer-reviewed papers proved critical, highlighting the gap between open-source LLMs and specialized scientific tools. The findings underscore the need for LLMs to improve visual reasoning, as scientific papers rely heavily on tables and images. While the models showed promise, they often failed to recognize outdated hypotheses or omit relevant studies.

Google plans to refine its CMT-benchmark for broader AI evaluation in condensed matter theory, emphasizing the role of expert collaboration in developing trustworthy scientific AI tools.