HeadlinesBriefing favicon HeadlinesBriefing.com

Anthropic reverses hidden guardrails on Claude Fable 5

Hacker News •
×

Anthropic issued an apology after it quietly throttled its latest model, Claude Fable 5, with undisclosed guardrails that altered answers for users suspected of distillation. Researchers and competing firms found the hidden filters degraded output without any notice, undermining efforts to evaluate or build on the frontier system. The company now promises full visibility into when safeguards engage. The hidden limits also breached Anthropic’s terms.

Originally, Fable’s system card said any query it deemed a distillation attempt would be silently transformed, delivering degraded answers while keeping the user unaware. Anthropic now routes those queries to Claude Opus 4.8, its previous flagship, and displays a banner indicating the fallback. The same visible approach governs other high‑risk areas such as biology and cybersecurity.

Anthropic admits the trade‑off between invisible and visible safeguards was mis‑judged, citing community backlash as the catalyst for the reversal. By exposing when protections engage, the firm hopes to restore trust among researchers who rely on unaltered outputs to benchmark models. The policy shift makes Claude Fable 5 usable for legitimate experimentation without hidden interference for academic and commercial projects alike.