• Get in Touch 📬
  • About
  • Home
  • News
    • Anthropic
    • Google
    • OpenAI
    • Model Releases
    • Policy and Regulation
    • Safety and Security
    • Business and Funding
    • Platforms and Partnerships
    • Infrastructure and Compute
    • Apps and Distribution
  • Research
  • Guides
  • Tools
  • Opinion
No Result
View All Result
No Result
View All Result
Home News Model Releases

OpenAI Releases GPT-5.3-Codex-Spark Model Using Cerebras Chips For Low Latency Inference

March 4, 2026
in Model Releases
Reading Time: 3 mins read
OpenAI Releases GPT-5.3-Codex-Spark Model Using Cerebras Chips For Low Latency Inference
6
VIEWS
Share on FacebookShare on Twitter

The race for artificial intelligence usually focuses on making models smarter, larger, and more capable of complex thought. OpenAI just released a tool that focuses entirely on making them faster. This new model runs on a massive chip from a company called Cerebras rather than the standard hardware used for most AI training. It suggests that the next bottleneck for AI companies is not just intelligence but the sheer amount of time it takes to get an answer.

Key Takeaways

  • OpenAI released GPT-5.3-Codex-Spark, a lightweight model designed for low-latency inference.
  • The tool is powered by Cerebras’ Wafer Scale Engine 3, which contains 4 trillion transistors.
  • OpenAI and Cerebras entered a multi-year hardware agreement worth over $10 billion.

The new model is called GPT-5.3-Codex-Spark. It is a stripped-down version of the agentic coding tool OpenAI released earlier this month. The company describes it as a “smaller version” built specifically for speed. While the original model handles heavy lifting, Spark is designed for “rapid iteration” and real-time work.

This release is the first public result of a massive partnership between OpenAI and chipmaker Cerebras. The two companies signed a multi-year deal worth over $10 billion to integrate Cerebras hardware into OpenAI’s infrastructure. CEO Sam Altman hinted at the launch on social media, noting that the new tool “sparks joy” for him.

The big deal

Speed changes how people use software. When an AI takes ten seconds to think, you treat it like a consultant. When it answers instantly, you treat it like a tool. OpenAI is trying to move its coding products into that second category. By splitting their offering into a “deep reasoning” mode and a “rapid response” mode, they are acknowledging that users do not always need a genius. sometimes they just need a fast answer.

Related articles

The real bottleneck is not what you think

The real bottleneck is not what you think

March 29, 2026
The real bottleneck is not training compute

The real bottleneck is not training compute

March 25, 2026

The hardware bet is equally significant. The industry relies heavily on standard GPU clusters, but OpenAI is putting serious capital into Cerebras and their Wafer Scale Engine 3. This chip is enormous—it holds 4 trillion transistors—and is built specifically to move data quickly. This partnership validates Cerebras, a company that recently raised $1 billion and is eyeing an IPO, as a serious player in the hardware market.

How it works

The system relies on a specialized chip designed to reduce latency during inference.

Inference: The process where a trained AI model processes data to make a prediction or generate an answer.

Think of a busy restaurant kitchen. The main Codex model is the executive chef who spends hours designing the menu and sourcing ingredients. Spark is the line cook who chops the onions and plates the food during the dinner rush. You do not need the executive chef to chop onions; you just need it done immediately.

Spark handles the immediate, low-level coding tasks that need to happen instantly. This frees up the larger, slower model to focus on complex logic and “deeper reasoning.” The Cerebras hardware acts as the high-speed station that allows the line cook to work without pausing.

The catch

This tool is not smart enough for everything. OpenAI explicitly states that Spark is for “rapid prototyping” rather than the “longer, heavier tasks” the original model handles. If you need deep architectural planning or complex execution, this lightweight version will likely struggle. It trades depth for speed.

Access is also restricted. The model is currently available only as a “research preview.” You must be a ChatGPT Pro user to access it within the Codex app. The source does not specify when or if this will roll out to free users or other enterprise tiers.

What now?

If you are a ChatGPT Pro user, you can access the research preview in the Codex app starting today. OpenAI frames this as the “first milestone” in their partnership with Cerebras, so expect more specialized models to run on this hardware soon.

Watch to see if the reported speed improvements hold up under real-world traffic, as this is the first major public test of the Cerebras chips at this scale.

Tags: agentic workflowsapi integrationsIlya Sutskeverinference optimizationnotionONNXOpenAIretrievalscraping
  • Trending
  • Comments
  • Latest
IBM Triples Entry Level Hiring To Pivot Junior Roles Toward Customer Engagement

IBM Triples Entry Level Hiring To Pivot Junior Roles Toward Customer Engagement

March 4, 2026
OpenAI Disbands Mission Alignment Team And Reassigns Safety Staff

OpenAI Disbands Mission Alignment Team And Reassigns Safety Staff

March 4, 2026
NVIDIA Nemotron Large Telco Model Manages Cellular Networks Through Autonomous Agents

NVIDIA Nemotron Large Telco Model Manages Cellular Networks Through Autonomous Agents

March 3, 2026
ElevenLabs Reports 330 Million In Revenue And Develops Autonomous AI Models

ElevenLabs Reports 330 Million In Revenue And Develops Autonomous AI Models

March 3, 2026
Amazon Invests Fifty Billion To Run OpenAI Models On Trainium Chips

Amazon Invests Fifty Billion To Run OpenAI Models On Trainium Chips

Resolve AI Reaches Billion Dollar Valuation To Automate Software Troubleshooting

Resolve AI Reaches Billion Dollar Valuation To Automate Software Troubleshooting

Microsoft Contract Retains Exclusive License to OpenAI Models Despite Amazon Deal

Microsoft Contract Retains Exclusive License to OpenAI Models Despite Amazon Deal

Alphabet Declines To Disclose Financial Terms Of Apple Gemini Partnership

Alphabet Declines To Disclose Financial Terms Of Apple Gemini Partnership

The real bottleneck is not what you think

The real bottleneck is not what you think

March 29, 2026
The real bottleneck is not training compute

The real bottleneck is not training compute

March 25, 2026
The real bottleneck is not model size

The real bottleneck is not model size

March 22, 2026
The real bottleneck is test time compute not training

The real bottleneck is test time compute not training

March 18, 2026

Get your daily dose of AI news and insights, delivered to your inbox.

© 2025 Tomorrow Explained. Built with 💚 by Dr.P

No Result
View All Result
  • Home
  • About
  • Get in Touch 📬
  • Newsletter 📧

© 2025 Tomorrow Explained by Dr.p