HeadlinesBriefing favicon HeadlinesBriefing.com

Anthropic rolls out Claude Fable 5 with hard‑line safety filters

Ars Technica •
×

Anthropic released Claude Fable 5, its first Mythos‑class model, claiming it outperforms the earlier Opus line. The launch pairs the new system with strict content filters that block queries about cybersecurity, biology and chemistry, areas the firm says could empower malicious actors. Fable 5 routes those requests to the older Claude Opus 4.8 and warns users of the redirection.

Anthropic says the safeguards are deliberately stricter than ideal, accepting a false‑positive rate under five percent in internal testing. Red‑team exercises totaling more than 1,000 hours and a bug‑bounty program failed to discover a universal jailbreak, and automated attacks were largely rebuffed compared with previous Claude models. The company cites concern that Mythos 5 could perform “agentic hacking,” chaining multi‑step exploits.

In external validation, the UK AI Security Institute found Mythos Preview performed on par with OpenAI’s GPT‑5.5 on Capture‑the‑Flag challenges, suggesting the leap is not unique to Anthropic’s architecture. By funneling high‑risk topics to a less capable model, Anthropic aims to blunt potential abuse while offering developers access to a stronger AI for tasks. The approach marks a cautious step toward safer frontier models.