HeadlinesBriefing favicon HeadlinesBriefing.com

Open‑source benchmark gauges AI coding agents' web reading

Hacker News •
×

The open‑source Agent Reading Test gives developers a way to measure how well AI coding assistants ingest real‑world documentation. By pointing an agent—Claude Code, Cursor, GitHub Copilot, or similar—at https://agentreadingtest.com/start/ it must complete ten tasks that expose common web‑fetch failures. Results are submitted to a scoring form that breaks down each pipeline shortfall and helps prioritize fixes across multiple platforms quickly today.

Each test page targets a specific failure mode defined in the Agent‑Friendly Documentation Spec. For example, a 150 KB page places canary tokens at 10 KB, 40 KB and later positions to reveal where truncation occurs. Another page buries content behind 80 KB of inline CSS, checking whether agents can separate style noise from meaningful text. It also adds SPA shells, tabbed sections, and soft‑404 checks.

Scoring awards one point per discovered canary token and an additional point for each qualitative answer, capping at 20 points. Early runs show most agents clustering between 14 and 18, indicating that current fetch pipelines still drop content in realistic scenarios. The benchmark flips the usual focus, evaluating agents instead of documentation sites, and provides a concrete metric for improvement today firmly.