HeadlinesBriefing favicon HeadlinesBriefing.com

Greptile's TREX Brings Runtime Testing to AI Code Review

Hacker News •
×

Greptile unveiled TREX, a system that executes code during AI-powered pull request reviews and generates artifacts showing what actually went wrong. Unlike traditional static analysis that only examines diffs, TREX runs the code to catch bugs invisible in source—logic errors, UI regressions, race conditions that require specific runtime states.

The team initially built TREX as a standalone test-generation agent, but discovered that generating tests differs from finding bugs. Separate agents wasted compute by exploring overlapping codebase areas without sharing context. Combining them into one agent overloaded the context window. Instead, they created an orchestrator pattern: the main Greptile reviewer spawns dedicated TREX subagents per issue, inheriting context while maintaining individual focus.

Early TREX versions reported findings as bullet points, which proved unverifiable and prone to hallucination. The solution captures multimodal artifacts—screenshots, logs, API traces, execution scripts, and even videos of animations. Each artifact provides verifiable evidence so reviewers can pinpoint exactly where failures occur, similar to showing mathematical work rather than just answers.

TREX uses disposable sandboxed environments with reusable base images for fast startup. The system is model-agnostic, allowing hot-swapping between frontier models without rebuilds. Evaluation focuses on recall and precision rather than latency, prioritizing accuracy over speed since developers prefer trustworthy reviews that take longer.