HeadlinesBriefing favicon HeadlinesBriefing.com

AI code tools rewrite too much, study shows

Hacker News •
×

AI‑assisted coding tools such as Cursor, GitHub Copilot, Claude Code and Codex are now routine, but users report a side effect: models often rewrite more code than necessary. When asked to fix a simple off‑by‑one bug, a model may replace the entire function, add helper routines and rename variables, producing a massive diff. This behavior, dubbed “over‑editing,” inflates review effort.

To quantify the problem, researchers built a benchmark from 400 corrupted Python snippets in BigCodeBench, each altered by a single operator or boolean flip so the minimal fix is obvious. They measured token‑level Levenshtein distance and added cognitive complexity, comparing model patches against the ground‑truth edit. Results show even top‑tier models rewrite far beyond the minimal change.

Among the evaluated systems, GPT‑5.4 registers the highest over‑editing scores, with a normalized Levenshtein of 0.39 in reasoning mode and added cognitive complexity exceeding two points, while its Pass@1 hovers below 0.73. The findings suggest that correctness metrics alone miss a critical dimension of code quality, and future toolchains must penalize unnecessary rewrites.

Developers can mitigate over‑editing by constraining prompts, using low‑effort reasoning settings, or post‑processing diffs to isolate minimal patches. Integrating the new benchmark into CI pipelines gives teams a quantitative guardrail, ensuring AI suggestions improve functionality without inflating cognitive load during code review.