HeadlinesBriefing favicon HeadlinesBriefing.com

AI Models Tested on Code Reviews: GPT-5.2 Leads

DEV Community •
×

In a recent code review test, GPT-5.2 outperformed both Claude Opus 4.5 and Gemini 3 Pro by finding the most issues, including a unique security bug. Claude Opus 4.5 was the swiftest, completing its review in just one minute, while Gemini 3 Pro also showed strong performance detection. All three frontier models successfully identified 100% of the SQL injection vulnerabilities, showcasing their effectiveness in catching critical security flaws.

The test used a TypeScript task management API with intentional issues, including SQL injection and path traversal. GPT-5.2 found a total of 13 issues, including a critical authorization bypass and a synchronous file write issue that blocked the Node.js event loop. Claude Opus 4.5 identified 8 issues, with a focus on security and correctness, while Gemini 3 Pro found 9 issues, including a performance issue with N+1 queries.

The results indicate that while frontier models offer deeper insights, free models like Grok Code Fast 1 can match their security detection capabilities. This suggests that for many teams, free models may suffice for routine security checks, although frontier models provide additional value in performance and authorization analysis.

Looking ahead, as AI models continue to evolve, the gap between free and frontier models may narrow further. Teams should consider their specific needs when choosing between speed, thoroughness, and cost. The findings underscore the importance of AI in code reviews and the ongoing development of these tools.