HeadlinesBriefing favicon HeadlinesBriefing.com

Pangram 3.3.2 detection model reveals hidden LLM fingerprints

Hacker News •
×

Pangram Labs released Pangram 3.3.2, a bug‑fixed update to its 2025 detection model. The system classifies documents as human‑written or AI‑assisted with industry‑leading false‑positive rates and multilingual support. Researchers published a deep dive showing how internal activations progressively separate the two classes across network layers. Since ChatGPT’s 2022 debut, AI‑generated prose now floods news, reviews and academia, sparking authenticity concerns.

The team built a 5,000‑document interpretability set, evenly split between human and AI sources, and extracted 5,120‑dimensional activations from every even layer. Linear probes achieve 0.83 accuracy after only layer 2 and reach perfect separation by layer 24, confirming that even shallow representations capture distinguishing signals in this controlled environment.

Unexpectedly, t‑SNE and UMAP visualizations reveal clusters by originating LLM family—even though the model receives no such labels during training. Probing these embeddings yields up to 91% top‑1 accuracy for six provider families, suggesting that Pangram 3.3.2 implicitly learns source fingerprints. The findings give engineers a clearer view of what the detector actually encodes.

The authors plan to extend the probe framework to newer model families and release the interactive explorer publicly, giving developers a tool to audit detection behavior on their own corpora.