OpenAI GPT-5.3-Codex-Spark: 1,000 Tokens/Sec on Cerebras

Abstract visualization of high-speed data streams flowing through a wafer-scale processor chip

OpenAI released a research preview of GPT-5.3-Codex-Spark on February 12, a stripped-down version of GPT-5.3-Codex built to run on Cerebras' Wafer Scale Engine 3. The pitch: coding fast enough to feel like pair programming, not waiting on a queue. According to OpenAI's blog post, the model pushes past 1,000 tokens per second, though that figure depends on the right hardware configuration.

Speed was only half the problem. OpenAI says it overhauled its inference stack to cut latency across the board: a persistent WebSocket connection, a rewritten streaming pipeline, and reworked session initialization. The company reports 80% less overhead per roundtrip, 30% less per-token overhead, and 50% faster time-to-first-token. Those numbers are self-reported and haven't been independently verified, but early testers like Simon Willison have confirmed the model feels noticeably faster in practice. In a side-by-side demo, a "build a snake game" task took 9 seconds on Spark versus 43 seconds on standard Codex.

There's a catch. Spark is a smaller model, and it shows. On SWE-Bench Pro it outperforms GPT-5.1-Codex-mini but falls short of the full GPT-5.3-Codex. It's text-only at launch with a 128k context window. Sachin Katti, OpenAI's Head of Industrial Compute, called the Cerebras partnership "a new platform capability," which is corporate-speak for: this is OpenAI's first production deployment away from Nvidia silicon. Per Cerebras' announcement, the collaboration traces back to a partnership announced in January.

ChatGPT Pro subscribers can try Spark now in the Codex app, CLI, or VS Code extension. API access is rolling out to select partners. No word yet on when it hits broader tiers, though OpenAI says Codex already has over 1 million weekly active users.

Bottom Line

Codex-Spark delivers 1,000+ tokens per second for real-time coding but trades capability for speed, underperforming the full GPT-5.3-Codex on benchmarks.

Quick Facts

1,000+ tokens per second (company-reported, hardware-dependent)
128k context window, text-only at launch
80% roundtrip overhead reduction, 50% faster time-to-first-token (company-reported)
Available to ChatGPT Pro users in Codex app, CLI, and VS Code
First OpenAI production model running on non-Nvidia hardware (Cerebras WSE-3)

Tags:OpenAICodexCerebrasreal-time codingAI inferenceGPT-5.3developer tools

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

OpenAI Launches GPT-5.3-Codex-Spark on Cerebras Hardware

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Anthropic Launches Tamagotchi-Style Virtual Pet in Claude Code

Anthropic Cuts Off Claude Subscriptions for OpenClaw and Third-Party AI Agents

Claude Code Leak Reveals the Secret Isn't the Model

Stay Ahead of the AI Curve