The race for artificial intelligence usually focuses on making models smarter, larger, and more capable of complex thought. OpenAI just released a tool that focuses entirely on making them faster. This new model runs on a massive chip from a company called Cerebras rather than the standard hardware used for most AI training. It suggests that the next bottleneck for AI companies is not just intelligence but the sheer amount of time it takes to get an answer.
Key Takeaways
- OpenAI released GPT-5.3-Codex-Spark, a lightweight model designed for low-latency inference.
- The tool is powered by Cerebras’ Wafer Scale Engine 3, which contains 4 trillion transistors.
- OpenAI and Cerebras entered a multi-year hardware agreement worth over $10 billion.
The new model is called GPT-5.3-Codex-Spark. It is a stripped-down version of the agentic coding tool OpenAI released earlier this month. The company describes it as a “smaller version” built specifically for speed. While the original model handles heavy lifting, Spark is designed for “rapid iteration” and real-time work.
This release is the first public result of a massive partnership between OpenAI and chipmaker Cerebras. The two companies signed a multi-year deal worth over $10 billion to integrate Cerebras hardware into OpenAI’s infrastructure. CEO Sam Altman hinted at the launch on social media, noting that the new tool “sparks joy” for him.
The big deal
Speed changes how people use software. When an AI takes ten seconds to think, you treat it like a consultant. When it answers instantly, you treat it like a tool. OpenAI is trying to move its coding products into that second category. By splitting their offering into a “deep reasoning” mode and a “rapid response” mode, they are acknowledging that users do not always need a genius. sometimes they just need a fast answer.
The hardware bet is equally significant. The industry relies heavily on standard GPU clusters, but OpenAI is putting serious capital into Cerebras and their Wafer Scale Engine 3. This chip is enormous—it holds 4 trillion transistors—and is built specifically to move data quickly. This partnership validates Cerebras, a company that recently raised $1 billion and is eyeing an IPO, as a serious player in the hardware market.
How it works
The system relies on a specialized chip designed to reduce latency during inference.
Inference: The process where a trained AI model processes data to make a prediction or generate an answer.
Think of a busy restaurant kitchen. The main Codex model is the executive chef who spends hours designing the menu and sourcing ingredients. Spark is the line cook who chops the onions and plates the food during the dinner rush. You do not need the executive chef to chop onions; you just need it done immediately.
Spark handles the immediate, low-level coding tasks that need to happen instantly. This frees up the larger, slower model to focus on complex logic and “deeper reasoning.” The Cerebras hardware acts as the high-speed station that allows the line cook to work without pausing.
The catch
This tool is not smart enough for everything. OpenAI explicitly states that Spark is for “rapid prototyping” rather than the “longer, heavier tasks” the original model handles. If you need deep architectural planning or complex execution, this lightweight version will likely struggle. It trades depth for speed.
Access is also restricted. The model is currently available only as a “research preview.” You must be a ChatGPT Pro user to access it within the Codex app. The source does not specify when or if this will roll out to free users or other enterprise tiers.
What now?
If you are a ChatGPT Pro user, you can access the research preview in the Codex app starting today. OpenAI frames this as the “first milestone” in their partnership with Cerebras, so expect more specialized models to run on this hardware soon.
Watch to see if the reported speed improvements hold up under real-world traffic, as this is the first major public test of the Cerebras chips at this scale.













