HeadlinesBriefing favicon HeadlinesBriefing.com

LLM agents stumble on backend structural constraints

Hacker News •
×

Researchers evaluated LLM agents on backend code generation using a fixed API contract across 80 greenfield and 20 feature‑implementation tasks covering eight web frameworks. The dual evaluation combined end‑to‑end behavioral tests with static verifiers to isolate structural complexity. Typical benchmarks reward only functional correctness, ignoring architectural patterns, databases, and ORM mappings. Their study uncovered a phenomenon called constraint decay.

Capable configurations lost roughly 30 points in assertion pass rates when moving from baseline to fully specified tasks, while some weaker setups fell to near zero. Sensitivity analysis showed agents performed well on minimal, explicit frameworks such as Flask, but struggled in convention‑heavy environments like FastAPI and Django, exposing large performance gaps, and highlights the brittleness of code that relies on implicit conventions.

Error analysis traced most failures to data‑layer defects, including incorrect query composition and runtime ORM violations. These findings demonstrate that current LLM agents still cannot reliably satisfy both functional correctness and architectural constraints in production code. Integrating feedback loops could reduce decay, but requires advances in static analysis speed and prompt engineering.