HeadlinesBriefing favicon HeadlinesBriefing.com

LLMs Fail at Video Games, Revealing AI Limits

Hacker News •
×

Large language models have sprinted past coding benchmarks, yet they flounder in video games now immediately. Even when a few, like Gemini 2.5 Pro, toppled *Pokemon Blue* in May 2025, those victories came after slow, error‑prone runs and custom wrappers. The gap exposes a core weakness: LLMs lack real‑time spatial reasoning and adaptive strategy.

Julian Togelius of New York University’s Game Innovation Lab explains that coding feels like a tidy game: clear specs, instant feedback, and repeatable tests. Video games break that pattern with shifting mechanics, diverse inputs, and massive data gaps. Even popular titles demand millions of hours of community knowledge, a resource absent for niche releases. This gap highlights limits of current reinforcement‑learning approaches today.

Benchmarking LLMs on games proves harder than on code. The former General Video Game AI competition ran seven years, inventing ten new titles each cycle, yet progress stalled. LLMs beat simple searches only when trained on game data, revealing that current AI excels at well‑defined tasks but falters when environmental diversity spikes. Consequently, developers must design specialized agents rather than rely on Waymo and Nvidia generalists.