HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI's WebRTC Architecture Solves Kubernetes Scaling for Voice AI

ByteByteGo •
×

OpenAI serves voice AI to 900 million weekly users using WebRTC, but deploying this protocol on Kubernetes creates unique challenges. WebRTC assumes stable IPs and ports, while Kubernetes treats compute as disposable. Most WebRTC architectures use Selective Forwarding Units for multiparty calls, but OpenAI's workload is predominantly 1:1 conversations between users and models.

The core problem breaks into two constraints: port exhaustion and state stickiness. Traditional deployment requires one UDP port per session, creating tens of thousands of public ports at scale. Cloud load balancers weren't designed for this complexity. Additionally, ICE and DTLS protocols demand stateful handling—packets must reach the same process that initiated the session or handshakes fail.

OpenAI's engineers split their architecture into a stateless relay at the geographic edge and a stateful transceiver handling protocol termination. The relay reads just enough packet data to route using the ICE ufrag field, forwarding encrypted audio without decrypting it. This approach maintains the client experience while solving deployment issues.

Built on the Pion library with contributions from WebRTC pioneers Justin Uberti and Sean Du Bois, this design lets backend services remain ordinary rather than forcing them into WebRTC peer roles. The result is low-latency voice AI that feels conversational rather than walkie-talkie awkward.