HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI's GPT-5.5 Shatters Pen‑Testing Benchmarks

Hacker News •
×

OpenAI released GPT-5.5, a model that rivals Anthropic's Mythos but is publicly available. XBOW tested it inside its automated penetration‑testing agents, measuring everything from vulnerability discovery to final reporting. The evaluation framework freezes known‑vulnerable open‑source apps and tracks how many flaws each model uncovers, providing a realistic end‑to‑end metric.

Across XBOW's internal benchmark, GPT-5.5 cut the miss rate to 10%, down from GPT-5's 40% and Opus 4.6's 18%. The gain appears in both black‑box and white‑box scenarios, with the model even outpacing GPT-5 when source code is supplied—effectively collapsing the expected performance gap. In visual‑acuity tests it hit 97.5%, matching Anthropic's top result.

Operationally, the new model logs into targets using roughly half the iterations of any prior version and aborts failed attempts twice as fast, tightening feedback loops for security teams. While XBOW will continue to route tasks to the most suitable model, GPT-5.5 now defines the baseline for automated pentesting workflows, delivering faster, more comprehensive assessments.