HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI API Speed Boost with WebSockets

OpenAI Blog •
×

OpenAI transformed their Responses API to accelerate agentic workflows, moving from dozens of synchronous requests to persistent WebSockets connections. This redesign eliminated redundant processing while maintaining a familiar API shape through previous_response_id parameter.

The optimization delivered 40% faster end-to-end performance, allowing users to experience the jump from 65 to nearly 1,000 tokens per second. Engineers achieved this through caching rendered tokens, reducing network hops, and improving safety stack efficiency.

Results showed immediate impact with GPT-5.3-Codex-Spark hitting 1,000 TPS (bursts up to 4,000 TPS). Vercel integrated the changes into their AI SDK seeing 40% latency reduction, while Cline's multi-file workflows improved by 39%, demonstrating real-world benefits across the developer ecosystem.