HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI adds real‑time voice models to API for developers

OpenAI Blog •
×

OpenAI rolled out three new audio models in its API, giving developers a toolkit for real‑time voice applications. GPT‑Realtime‑2 delivers GPT‑5‑class reasoning in live conversations, while GPT‑Realtime‑Translate handles live translation across 70+ input languages into 13 outputs, and GPT‑Realtime‑Whisper streams speech‑to‑text as users talk. The launch targets use cases from in‑car assistants to multilingual support desks.

Early adopters such as Zillow and Deutsche Telekom are already testing the models. Zillow’s voice agent can locate homes, avoid busy streets, and schedule tours without breaking the dialogue, while Telekom experiments with cross‑language calls that keep pace with speakers, and OpenAI added features like preambles, parallel tool calls, longer 128K context windows, and adjustable reasoning levels to keep interactions smooth.

Benchmarks show GPT‑Realtime‑2 (high) improves Big Bench Audio scores by 15.2 % over its predecessor, and the xhigh setting lifts Audio MultiChallenge results by 13.8 %. Developers can now build agents that reason, translate, transcribe, and act within a single spoken turn, moving voice interfaces from simple prompts toward genuinely productive assistants.

By exposing these capabilities through a single API, OpenAI lowers the barrier for startups and enterprises to embed sophisticated voice AI without assembling separate pipelines. Pricing remains tied to existing usage tiers, and the models respect the same safety guardrails that power ChatGPT. Companies can now launch voice‑first products that understand context, handle interruptions, and deliver multilingual support out of the box.