HeadlinesBriefing favicon HeadlinesBriefing.com

Context‑Aware Token Proxy Shields LLMs from Sensitive Security Data

Hacker News •
×

A team built a “Ghost Analyst” on Anthropic’s Claude to triage Microsoft Sentinel and Defender alerts. The flow pulls Entra ID logs, generates KQL queries and returns a report, but it also contains IPs, usernames and corporate domains that cannot be sent raw to a cloud model. To keep using a model they added a DLP layer that pseudonymizes outbound data and restores originals inbound.

The first regex‑only version replaced emails with bracketed tags, which confused Claude and produced hallucinated users like “Sarah Kowalski”. Adding a lightweight spaCy Named Entity Recognition pass allowed detection of person and organization names and produced syntactically valid pseudonyms such as [email protected]. Yet stripping raw values erased clues needed for security reasoning, for example impossible‑travel detection and malicious domain identification.

Version 3 turned the proxy into a translator: IPs are swapped for other addresses in the same ASN using MaxMind GeoLite2, preserving geographic signals; domains are classified as internal, partner or external so Claude can still infer insider‑vs‑outsider communication; a layered skiplist eliminates false positives on Graph permissions and KQL identifiers. The team released the token‑proxy on GitHub, offering a DLP front‑end for cloud LLM workflow.