HeadlinesBriefing favicon HeadlinesBriefing.com

SWE-gen automates software bug task creation

Hacker News: Front Page •
×

Abundant AI released SWE-gen, a tool that converts merged GitHub pull requests into automated bug-fix tasks for software engineering benchmarks. It uses Claude Code to analyze repositories, detect languages and build systems, then reverses PRs to recreate buggy states. This automates the labor-intensive process of creating SWE-bench style problems.

The tool generates tasks where tests fail on the baseline code and pass after applying the fix. It's fully containerized, works with any programming language, and supports multiple cloud sandbox environments. Recently, it produced a 1,000-task dataset for JavaScript/TypeScript, demonstrating scalability for benchmarking AI coding agents.

Developers can install it via `uv`, then use commands like `swegen create` or `swegen farm` to process repositories. The workflow validates tasks with NOP (baseline fails) and Oracle (solution passes) agents. This addresses a key bottleneck in evaluating AI's ability to solve real-world software bugs.