HeadlinesBriefing favicon HeadlinesBriefing.com

Gemma Models Reveal Layered Factual Recall Circuit

Towards Data Science •
×

In a recent mechanistic study, researchers dissected how the Gemini family of language models stores and retrieves facts. Using activation patching across 60 prompt pairs, they mapped a three‑phase circuit inside Gemma‑2B. The work shows how fine‑grained signals travel through layers to produce accurate answers.

The analysis broke the retrieval process into storage, routing, and readout phases. In Gemma‑2B, storage occurs in layers 0–14 at the entity token, where the residual stream dominates. Routing spreads across attention heads, none singularly decisive, while readout pulls the encoded answer from the final layers.

Extending the experiment to Gemma‑12B‑IT confirmed the same topology, but shifted storage to layers 0–27 and amplified the distributed nature of attention routing. Tokenizer quirks caused a few prompt pairs to drop, highlighting the sensitivity of cross‑model comparisons to tokenization differences that can skew causal analyses and prompting developers to validate prompts across architectures consistently.

These findings give engineers a concrete map of where factual knowledge lives inside transformer blocks, enabling targeted debugging and efficient fine‑tuning. By knowing that residual streams carry the bulk of the signal, designers can focus on optimizing those pathways without overhauling attention mechanisms, directly improving model reliability on knowledge‑heavy tasks for developers building enterprise‑grade AI.