HeadlinesBriefing favicon HeadlinesBriefing.com

Cursor Bench 3.1 Reveals Top AI Coding Agents by Performance

Hacker News •
×

Cursor has released CursorBench 3.1, a benchmark evaluating AI coding agents on real-world, multi-file tasks involving codebase understanding and bug detection. The test measures both performance scores and cost efficiency across leading models. Fable 5 Max achieved the highest performance score at 72.9%, though at a significant cost of $18.02 per task.

The benchmark introduces new problems focused on planning, refactoring, and code review, with improvements to grading criteria for edit tasks. Results show a clear tradeoff between performance and cost, with 72.9% being the top score while Composer 2.5 delivered competitive results at just $0.55 per task.

Most models clustered between 50-70% performance, with Opus 4.8 and GPT-5.5 variants showing strong middle-ground performance. The data reveals which AI coding tools deliver the best value for developer workflows, helping teams choose models that balance capability with budget constraints.