HeadlinesBriefing favicon HeadlinesBriefing.com

Layer Duplication Boosts Transformer Reasoning by 17% in Devstral-24B

Hacker News •
×

GitHub user AlainNotHere replicated David Ng's RYS method and found that duplicating specific layers in two transformer models significantly improves reasoning. Devstral-24B saw logical deduction jump from 0.22 to 0.76 on BBH benchmarks after duplicating layers 12-14, a 245% improvement. Qwen2.5-32B achieved a 23% reasoning boost by duplicating layers 7-9. No training or weight changes were needed—just routing hidden states through the same circuit twice. Tools to find and duplicate these reasoning circuits are included in the repo. The discovery took one evening on AMD GPUs.

This reveals transformers have discrete cognitive units called 'reasoning circuits'—contiguous layer blocks acting as indivisible functional units. Duplicating these blocks gives models a second pass through their reasoning pipeline without performance loss. Different models have circuits in different locations: Devstral-24B's is at layers 12-14, while Qwen2.5-32B's sits at layers 7-9. The boundaries are sharp—shift by one layer and the effect disappears. The toolkit allows arbitrary layer duplication patterns, creating different cognitive profiles like math specialists or EQ specialists from the same weights.

Practical applications include accelerating inference on consumer hardware. The open-source tools enable researchers to find circuits in any GGUF model and apply custom duplication. Ng's original RYS method inspired this work, which extends his findings with new circuit discovery techniques. The project demonstrates how transformer architectures can be subtly altered for performance gains without retraining, opening new avenues for model optimization.