HeadlinesBriefing favicon HeadlinesBriefing.com

Statewright AI Agents: State Machines That Fix Brittle AI Reliability

Hacker News •
×

Statewright AI tackles brittle agentic problem-solving by enforcing strict state machines over large models. Instead of scaling parameters, the Rust-based tool constrains workflows: planning states limit tools to read-only actions, implementation phases restrict edits with safety guards, and testing environments block destructive commands. This approach improved SWE-bench task completion rates from 2/10 to 10/10 for models like GPT-OSS 20B and Gemma4 31B, with 13B+ parameter models showing marked gains. The system’s protocol-layer enforcement prevents tool misuse without human oversight, unlike prompt-based guardrails.

The core innovation lies in deterministic state transitions—models can’t skip steps or use unauthorized tools. For example, during implementation, agents face limited bash access and capped edit sizes, while testing phases only allow pre-approved commands like pytest. Statewright’s visual editor lets users map workflows, exposing failure paths and retry loops. Integration with Claude Code, Codex, and Cursor via MCP ensures compatibility, though Cursor’s architecture requires advisory rules rather than hard enforcement.

Research validated the method on local models, showing 5-task SWE-bench success rates doubling for mid-sized models. Frontier models like Haiku and Sonnet also outperformed baselines, solving tasks with fewer tokens and avoiding death spirals. Below 13B parameters, context retention limits accuracy, but Statewright’s structured context utilization outperforms raw scale.

Available via a free tier (372-hour runtime limit) and paid plans ($29/month for unlimited), Statewright ships with Claude Code plugin support. Self-hosting options exist under the Apache 2.0 license. By treating states as "laws" rather than suggestions, the tool reduces hallucinations and improves reproducibility in agentic workflows.