HeadlinesBriefing favicon HeadlinesBriefing.com

Independent Benchmark Tests Mythos Security Claims Against Competing AI Models

Hacker News •
×

An anonymous researcher created a benchmark suite to test whether AI models can match Mythos's alleged prowess at finding security vulnerabilities. Mythos, reportedly restricted due to its powerful exploit-finding abilities, sparked skepticism about whether cost concerns rather than capability drove its limited availability.

The benchmark uses nine real bugs that Mythos discovered, verifying each was identifiable by top models like Opus when directly examined. All bugs post-date model knowledge cutoffs, ensuring clean test conditions. Models received identical access to source files without hints about what to find, mimicking realistic security auditing.

Results surprised the creator, who expected better performance across the board. Gemma 4 emerged as the leader by detecting 4 out of 9 bugs with perfect precision, while GPT 5.5 Pro consumed $100 in just four test cases. Most models struggled with multi-file vulnerabilities that require deep contextual understanding.

Notably, Google's Antigravity CLI refused eight of nine security analysis requests outright, forcing the researcher to pay for API access instead. Agent-based testing consistently underperformed compared to direct model API calls, with costs significantly higher. The sparse data suggests Mythos may genuinely excel at security bug detection, though the benchmark remains imperfect.