HeadlinesBriefing favicon HeadlinesBriefing.com

Domain‑Camouflaged Injection Cripples LLM Guard Detectors

Hacker News •
×

Researchers from Hacker News expose a blind spot in LLM‑agent defenses. Standard injection detectors, tuned to static, template‑like directives, miss payloads that imitate the target document’s terminology and hierarchy—a technique the authors label domain‑camouflaged injection. On Llama 3.1 8B, detection drops from 93.8% to 9.7% when facing such crafted inputs. Such attacks slip past defenses because they blend with legitimate citations and policy language.

The study formalizes the shortfall as the Camouflage Detection Gap (CDG), measuring the delta between static and camouflaged detection rates. Across 45 tasks in three domains, CDG proves statistically significant for both Llama (χ²=38.03, p<0.001) and Gemini 2.0 Flash (χ²=17.05, p<0.001). Even Llama Guard 3’s zero score shows specialized safety layers also fail.

Multi‑agent debate frameworks amplify static injections up to 9.9× on smaller models, while larger models show collective resistance. Targeted augmentation improves detection modestly—10.2% on Llama but 78.7% on Gemini—indicating the flaw stems from architecture rather than training data. The authors release their task bank, payload generator, and evaluation code for public scrutiny.