HeadlinesBriefing favicon HeadlinesBriefing.com

Claude Opus 4.5, GPT-5.2 High, Gemini 3 Pro Coding Test

DEV Community •
×

A developer pitted the top three coding models against each other on real tasks in a popular open-source repo. Claude Opus 4.5 delivered the most polished and consistent results, building both a global action palette and an analytics dashboard with minimal fuss, though it was the most expensive. GPT-5.2 High required more time due to its high-reasoning setting but produced excellent, well-structured code. Gemini 3 Pro was fastest and cheapest but felt minimal, often missing polish and integration points.

The test used the same prompts and codebase for all models, evaluating code quality, hand-holding needed, and final functionality. While all three models successfully built the requested features, the depth and completeness varied significantly. Claude Opus 4.5 and GPT-5.2 High created more comprehensive, production-ready outputs, whereas Gemini 3 Pro delivered functional but bare-bones versions. The results highlight a clear trade-off between cost, speed, and output quality.

For now, these models are best viewed as powerful assistants for refactoring and prototyping, not as a replacement for human developers in complex production codebases. The author cautions that while the recent improvements are impressive, relying solely on generated code for long-term projects remains risky. The industry is advancing rapidly, but the final polish and architectural decisions still heavily benefit from human oversight.