HeadlinesBriefing favicon HeadlinesBriefing.com

O3 GeoGuessr Prompt Benchmark Reveals Surprising Results

Hacker News •
×

Back in April 2023, Kelsey Piper discovered that OpenAI's o3 model could pinpoint photo locations with remarkable accuracy, matching professional GeoGuessr players. The model correctly identified continents and countries from single images, sparking excitement about its hidden capabilities. However, the elaborate prompt that supposedly unlocked this skill may not be the magic bullet many believed.

A recent benchmark tested the famous GeoGuessr prompt against a basic "think carefully" prompt using 200 images from Wikimedia Commons and other sources. The results surprised researchers: the simple prompt achieved a median error of 83.24km compared to 102.34km for the elaborate version. Both prompts performed similarly, suggesting the complex prompt added little value despite being ten times longer.

These findings reveal how easily developers can fool themselves about prompt effectiveness. When models already excel at a task, elaborate prompting often appears to help when it's actually the underlying capability doing the work. The research also shows that o3's geolocation abilities haven't transferred to newer models like gpt-5.4 and gpt-5.5, highlighting how specialized capabilities can be lost during model updates.

The study demonstrates that rigorous benchmarking is essential before declaring any prompt engineering breakthrough, especially given how quickly AI narratives shift in public discourse.