HeadlinesBriefing favicon HeadlinesBriefing.com

AI Agent Debugs CI Flakiness at Scale with Raw Log Queries

Hacker News •
×

An AI agent now autonomously traces flaky CI tests by querying petabytes of historical log data. It constructs and executes its own SQL, scanning hundreds of millions of log lines across multiple queries to find root causes in seconds. This required feeding the agent complete context: every build, test, and log line across months of history. The system ingests about 1.5 billion CI log lines weekly.

Instead of a rigid tool API, the team exposed a raw SQL interface. This flexibility is critical because LLMs excel at generating SQL, allowing the agent to ask novel debugging questions no predefined function could anticipate. It primarily queries two targets: a materialized view of job metadata (63% of the time) and the raw log lines themselves (37%), following an investigative pattern from broad failure rates to specific error traces.

Storing this data efficiently required a bold denormalization bet. Every log line carries 48 duplicated metadata columns—commit SHA, author, branch, job name, etc. In a traditional database this would be wasteful, but in ClickHouse's columnar format, repetition compresses almost entirely. This design lets the agent filter freely on any column without costly joins. The total uncompressed size is 5.31 TiB, but on disk it occupies just 154 GiB, a 35:1 compression ratio.

Query performance is the real test. Across 52,000 investigations, median latency for job metadata queries is 20ms. For raw log scans, it's 110ms. The heaviest sessions scan billions of rows but still complete in seconds. This proves an LLM can effectively navigate a massive, structured dataset to perform complex root cause analysis that would take humans minutes of manual log scrolling.