HeadlinesBriefing favicon HeadlinesBriefing.com

Anthropic's Fable AI faces backlash over strict guardrails

Hacker News •
×

Anthropic rolled out Fable Tuesday as a public, limited version of its high‑profile cybersecurity model Mythos. The launch immediately sparked pushback from security researchers who say the model blocks even benign queries. IBM X‑Force analyst Valentina “Chompie” Palmiotti noted that reading a simple blog post triggers a refusal, and other experts report similar over‑reach.

Anthropic said the guardrails prevent Fable from being used to craft malware or bio‑weapons, extending precautions that first appeared in Mythos’s Project Glasswing. That program limited access to a handful of vetted firms until last month, when Anthropic opened Mythos to hundreds of organizations across 15 countries. Approved participants can join a Cyber Verification Program for looser restrictions.

When a prompt hits the keyword filter, Fable pauses and falls back to Claude Opus 4.8, citing “cybersecurity or biology” flags. Researchers like Matt Suiche observe that asking for secure code or a simple review is downgraded, limiting legitimate engineering work. Anthropic declined comment, leaving the community to await refined guardrails.

The controversy highlights tension between AI safety and practical utility, reminding developers that overly broad filters can hinder the very security research they aim to protect.