HeadlinesBriefing favicon HeadlinesBriefing.com

DeepSeek revamps residual connections with manifold‑constrained hyper‑connections

Towards Data Science •
×

DeepSeek researchers released a paper proposing Manifold‑Constrained Hyper‑Connections (mHC) to replace the decade‑old residual connections that still underpin most AI models. Standard residual links, introduced with ResNets in 2015, let gradients flow unchanged but now act as a bottleneck as layers and parameters swell. The new design seeks to keep the shortcut’s stability while widening its expressive capacity for modern models.

Hyper‑Connections widen the shortcut by factor n, creating parallel streams that are compressed before entering attention or MLP blocks and expanded afterward. ByteDance introduced this idea in 2024, but DeepSeek found two fatal flaws: the residual‑mapping matrix breaks the identity property, causing signal amplification up to 3,000×, and the wider stream overloads GPU memory bandwidth, dramatically nullifying the theoretical gains.

To restore stability, the team projects the residual matrix onto the Birkhoff polytope, forcing it to be doubly stochastic. This constraint caps the spectral norm at one, guaranteeing norm preservation across dozens of layers, and ensures compositional closure so the matrix remains stable when multiplied. The result is a wider, mathematically safe shortcut that runs without blowing GPU memory in practice.