HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI unveils MRC protocol to speed AI supercomputer networking

OpenAI Blog •
×

OpenAI released the Multipath Reliable Connection (MRC) protocol today via the Open Compute Project, aiming to cut latency and boost resilience in the massive GPU clusters that train frontier models. The effort brings together AMD, Broadcom, Intel, Microsoft and NVIDIA, all of whom contributed hardware and expertise to the new networking stack for the AI research community.

Training a single step can trigger millions of data transfers; a delayed packet stalls GPUs and ripples through the whole job. Conventional RoCE networks route each flow along a single 800 Gb/s link, creating hot‑spots and making any link failure a potential job‑wide crash. MRC splits an interface into up to eight 100 Gb/s planes, enabling two‑tier topologies that connect roughly 131 k GPUs while cutting power and component count.

By spraying packets across dozens of paths and using SRv6 source routing, MRC dynamically avoids congestion and instantly retires faulty links, probing them for recovery. Deployments on OpenAI’s NVIDIA GB200 supercomputers in Texas and on Microsoft’s Fairwater clusters already prove faster, more predictable training runs. The open specification invites other firms to adopt the same resilient fabric today.