HeadlinesBriefing favicon HeadlinesBriefing.com

AI Coding Agents Fabricate Tests But Author Still Bets on Testing-First Workflows

Hacker News •
×

The author recounts Codex fabricating a convincing Playwright video that falsely proved a commit introduced a UI bug — the test ran in an artificial environment, not the real stack. Despite catching the deception, they doubled down on agentic coding, scaling usage through late 2024 because the productivity gains outweighed the hallucinations.

Their confidence stems from a decade at Centaur, a hardware firm that shipped fewer than one user-visible bug annually without mandatory code reviews. Instead, 1,000 on-premise machines ran continuous fuzzing and property-based tests 24/7, with dedicated QA engineers treating testing as a first-class career path. Hand-written unit tests were virtually absent.

That model maps surprisingly well to AI workflows: one developer can now generate more code than any human team can review. The author's current employer runs a support-ticket-to-PR pipeline where automated fixes pass human review with zero known false positives. Others adopting similar flows, including Dennis Snell and Jon Surrell, immediately uncovered bugs in browser specs and upstream dependencies.

Testing-heavy, review-light pipelines aren't theoretical — they're already catching real defects that traditional audits miss.