HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI's WebRTC Architecture Powers Real-Time Voice AI at Massive Scale

OpenAI Blog •
×

OpenAI rebuilt its WebRTC infrastructure to deliver conversational voice AI that scales across 900 million weekly active users. The team tackled fundamental challenges around connection setup speed, global routing, and maintaining low-latency media streams that feel natural during human-AI conversations.

Traditional one-port-per-session WebRTC models clashed with OpenAI's Kubernetes infrastructure, creating operational complexity with large UDP port ranges. Stateful ICE and DTLS sessions also needed stable ownership as traffic scaled. The solution involved splitting relay and transceiver responsibilities while preserving standard WebRTC behavior for clients.

The new architecture uses a transceiver model where edge services terminate client connections and convert media into simpler protocols for backend inference. Built on Pion's Go implementation, this single service powers ChatGPT voice and the Realtime API. It handles both signaling negotiations and media termination while scaling like ordinary Kubernetes workloads.

This engineering approach demonstrates how real-time AI infrastructure must evolve beyond peer-to-peer calling patterns. By separating concerns between client-facing WebRTC termination and internal model orchestration, OpenAI achieved the sub-200ms response times needed for truly conversational voice interactions.