HeadlinesBriefing favicon HeadlinesBriefing.com

Open‑source AI tools strip safety guardrails in minutes

Financial Times Companies •
×

Researchers using the open‑source tool Heretic stripped safety guardrails from Meta’s Llama 3.3 and Google’s Gemma models in minutes. The modified systems answered prompts about chlorine‑gas attacks, credit‑card theft code and child‑abuse narratives—queries the original models blocked. Tests by the FT and AI safety group Alice show that a four‑line script can disable protections in under ten minutes for malicious intent in practice.

Heretic’s creator Philipp Emanuel Weidmann says the utility has produced more than 3,500 “decensored” models since its release, with downloads exceeding 13 million. He removed safeguards from Google’s Gemma 4 within 90 minutes of launch. The ease of “abliteration” threatens regulators, as open‑source AI now rivals proprietary systems and can be altered without the original developers’ oversight and can be shared widely.

Companies have spent millions building guardrails, yet the open‑source surge forces a rethink of risk‑mitigation budgets. OpenAI’s GPT‑OSS approach censors dangerous data at training time, but critics warn it may render models blind to malicious use. With tools like Heretic freely available, investors should watch how AI firms allocate resources to secure code and compliance frameworks across cloud and edge deployments.