HeadlinesBriefing favicon HeadlinesBriefing.com

AI Safety Controls Easily Bypassed by Simple Poetry Tricks

New York Times Top Stories •
×

Three years after ChatGPT's launch, researchers in Italy discovered that poetic language could bypass safety controls on 31 AI systems across major tech companies. By starting prompts with elaborate verse like "the iron seed sleeps best in the womb of the unsuspecting earth," they tricked artificial intelligence into ignoring internal safeguards meant to prevent dangerous outputs.

The findings reveal that guardrails designed by Anthropic, Google, and OpenAI to stop misinformation, weapon development, and hacking attempts are functioning more as suggestions than actual barriers. Safety controls that were meant to be robust have proven surprisingly fragile, with each closed loophole revealing another vulnerability.

This weakness becomes more concerning as AI systems grow increasingly sophisticated at identifying security gaps and performing risky tasks. Anthropic recently limited its Claude Mythos release to select organizations due to the model's ability to uncover software vulnerabilities, followed by OpenAI imposing similar restrictions on its technology.

Researchers have repeatedly demonstrated pathways to circumvent AI safety measures since OpenAI sparked the AI boom in late 2022. The persistent failure to establish reliable safeguards suggests fundamental limitations in current protective approaches.