OpenAI and its hardware partners release MRC, a new networking protocol for giant AI clusters

news

OpenAI says its new MRC protocol, released with AMD, Broadcom, Intel, Microsoft, and NVIDIA, is designed to make large AI training networks faster, simpler, and more resilient — and the announcement is getting traction on X for good reason.

Official AMD image used on its MRC announcement page about AI networking at scale

What happened

OpenAI has announced MRC (Multipath Reliable Connection), a new networking protocol built for very large AI training clusters, and released it alongside AMD, Broadcom, Intel, Microsoft, and NVIDIA. The short version is that OpenAI wants to move huge amounts of GPU traffic across supercomputer networks with fewer stalls, fewer routing headaches, and better failure tolerance when links or switches misbehave.

This is a meaningful infrastructure story, not just a branding exercise. Large model training jobs can waste an enormous amount of expensive GPU time when network congestion or link failures ripple through the cluster. OpenAI is arguing that MRC is one of the pieces it now needs to keep frontier-scale training systems efficient as cluster sizes keep climbing.

What the official source confirms

OpenAI's official engineering post says MRC was developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA and has now been released through the Open Compute Project for wider industry use. OpenAI says the protocol is designed to improve both performance and resilience in large training networks by spreading traffic across many paths, reacting to failures in microseconds, and simplifying the control plane with SRv6-based source routing.

The same post also makes two practical claims that matter more than the protocol name itself. First, OpenAI says MRC is already deployed across its largest NVIDIA GB200 supercomputers used for frontier model training, including systems with Oracle Cloud Infrastructure in Abilene, Texas, and Microsoft's Fairwater supercomputers. Second, OpenAI says MRC has already been used to train multiple OpenAI models in production-scale environments.

OpenAI's public paper and OCP release go a step further by framing MRC as an open specification rather than a one-off internal trick. That matters because it suggests OpenAI wants this to become shared infrastructure across the AI hardware ecosystem, not just a proprietary optimization hidden inside one vendor stack.

Why the story is trending on X

The story picked up attention on X after @OpenAI framed MRC as a concrete new protocol release rather than a vague research teaser. That post also named a heavyweight partner list — AMD, Broadcom, Intel, Microsoft, and NVIDIA — which naturally widened the audience beyond OpenAI watchers into cloud, infra, networking, and GPU circles.

This is the kind of post that travels well on X because it gives technical people something specific to react to: an open protocol, a production deployment claim, and a direct tie to the economics of frontier AI training. It is not just “AI got better.” It is a more interesting claim: OpenAI thinks network design itself is now a first-class bottleneck for model progress.

What this means for developers, builders, or product teams

Most app developers will never touch MRC directly, but the signal still matters. The biggest AI companies are no longer only competing on models and chips. They are increasingly competing on the plumbing between those chips. If MRC works as advertised, it could reduce wasted GPU time, shorten training interruptions, and make extremely large training systems less brittle.

For infrastructure teams, the more interesting angle is architectural. OpenAI is arguing for a simpler and more deterministic network control model, with multi-plane designs and source routing replacing a lot of the operational fragility that shows up in conventional large-scale fabrics. If that approach spreads, it could shape how future AI clusters are built across cloud and enterprise environments.

For product teams using frontier models indirectly, better training infrastructure usually means faster model iteration, fewer infrastructure-induced delays, and eventually lower pressure on the cost of shipping bigger systems. It is not a consumer-facing feature today, but it is the kind of backend change that can influence what model providers can deliver tomorrow.

What remains unclear

A few things are still unresolved from the outside. OpenAI has shared strong engineering claims, but it has not published the kind of broad independent benchmarking that would make it easy to compare MRC against other large-scale Ethernet or InfiniBand-style approaches across many environments.

It is also not clear how quickly the wider industry will adopt MRC beyond the immediate partner ecosystem that helped develop it. Open specifications are useful, but standardization and broad deployment are different things.

And while OpenAI says MRC is already used in frontier training, the real long-term question is whether it becomes a durable shared standard for AI supercomputing or mainly a protocol that makes the most sense at the absolute top end of cluster scale.

Sources