HeadlinesBriefing favicon HeadlinesBriefing.com

OpenAI's 1,000 tokens/sec coding model runs on Cerebras chips

Ars Technica - All content •
×

On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware, deploying the new GPT-5.3-Codex-Spark coding model on chips from Cerebras. The model delivers code at more than 1,000 tokens per second, which is reported to be roughly 15 times faster than its predecessor. To compare, Anthropic's Claude Opus 4.6 in its new premium-priced fast mode reaches about 2.5 times its standard speed of 68.2 tokens per second.

Codex-Spark is a research preview available to ChatGPT Pro subscribers ($200/month) through the Codex app, command-line interface, and VS Code extension. OpenAI is rolling out API access to select design partners. The model ships with a 128,000-token context window and handles text only at launch. The release builds on the full GPT-5.3-Codex model that OpenAI launched earlier this month, where the full model handles heavyweight agentic coding tasks.

On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks for evaluating software engineering ability, Spark reportedly outperforms the older GPT-5.1-Codex-mini while completing tasks in a fraction of the time, according to OpenAI. The company did not share independent validation of those numbers. Anecdotally, Codex's speed has been a sore spot; when Ars tested four AI coding agents building Minesweeper clones in December, Codex took roughly twice as long as Anthropic's Claude Code to produce a working game.