HeadlinesBriefing favicon HeadlinesBriefing.com

Mythos AI passes major cybersecurity test, outpacing Claude

Ars Technica •
×

The UK government's testing body, AISI, evaluated advanced AI models for autonomous cyber capabilities, finding that Mythos dramatically outperformed competitors. Mythos secured a complete "start to finish" solution on the TLO (Targeted Lateral Operation) benchmark, a first for any tested model. This suggests a tangible risk evolving in the digital defense sphere.

Anthropic’s Claude 4.6 managed only 16 steps on average during the TLO evaluation, whereas the Mythos Preview runs averaged 22 successful infiltration steps out of 32 required actions. Even with a restricted 100 million token budget, Mythos demonstrated superior offensive sequencing compared to its rivals in simulated environments.

Despite this success, AISI cautioned that Mythos still falters on the highly difficult "Cooling Tower" test, which simulates power plant control software disruption. Furthermore, the test environment lacks active defenders and real-world detection mechanisms, meaning the true threat level to well-defended systems remains uncertain.

AISI stressed that organizations building defenses must begin utilizing similar AI tools to preemptively harden infrastructure against future iterations of these autonomous hacking agents. The performance gap between Mythos and the next best model indicates a rapid escalation in offensive AI potential, warranting serious industry attention toward defensive AI adoption.