HeadlinesBriefing favicon HeadlinesBriefing.com

Claude's Hidden Thinking: Anthropic Reveals AI's Secret Strategies

ByteByteGo •
×

Anthropic researchers have discovered that Claude doesn't think the way it claims to. When asked to add 36 and 59, the AI describes using standard carrying algorithms, but internal monitoring reveals it actually uses parallel computational paths—one estimating magnitude while another calculates the final digit.

This gap between explanation and execution is just the beginning. Using specialized tools that act like a microscope for AI, researchers traced Claude's internal computations across tasks from poetry writing to handling dangerous prompts. They found that individual neurons don't map cleanly to concepts—a single neuron might activate for basketball, round objects, and the color orange simultaneously.

The team developed techniques to decompose neural activity into interpretable 'features' like smallness or rhyming words. These tools let them intervene in Claude's thought process, suppressing or injecting specific concepts to see how outputs change. This causal evidence reveals that Claude plans ahead when writing poetry, choosing rhyming words before composing lines, rather than building them word-by-word.