HeadlinesBriefing favicon HeadlinesBriefing.com

AI Security Test: 2,000 Hackers Failed to Breach Secrets

Hacker News •
×

I built hackmyclaw.com as a security challenge where anyone could try to make my AI assistant Fiu leak a secrets.env file. After reaching Hacker News, over 2,000 people sent 6,000+ emails attempting to break it. The secrets never leaked.

People got creative with their attacks: emergency scenarios, authority impersonation, and multi-language social engineering. Google suspended Fiu's Gmail after detecting fraud from the volume of emails and API calls. The setup cost More than $500 in API expenses, and batch processing contaminated some results since early injection attempts made the model suspicious of later messages.

The experiment used Claude Opus 4.6, Anthropic's model specifically trained for prompt injection resistance. Fiu successfully identified the coordinated attack around email 500. Zero successful extractions occurred across all attempts, though some attacks were surprisingly sophisticated.

Attackers reached out wanting to sponsor the project, including Corgea, Abnormal AI, and an anonymous donor. The results suggest prompt injection is harder than expected with capable models, though it remains a real security concern for AI agents with arbitrary permissions.