Cerebras Enables OpenAI’s gpt-oss-120B Model with Record-Breaking AI Inference Speeds

August 6, 2025|2 min read

Listen to this story

ⓘ AI NARRATED

0:00 / 0:00

Cerebras Systems has announced support for OpenAI’s first open-weight reasoning model, gpt-oss-120B, on its AI Inference Cloud, achieving an output speed of 3,000 tokens per second. The 120B-parameter model, designed for complex tasks in math, science, and code, matches the intelligence of proprietary models like Gemini 2.5 Flash and Claude Opus 4. It runs on Cerebras’ wafer-scale AI infrastructure, which eliminates GPU memory bandwidth bottlenecks and communication overhead.

The collaboration allows developers to integrate gpt-oss-120B into existing OpenAI endpoints in 15 seconds without refactoring or migration. The model, licensed under Apache 2.0, enables users to fine-tune for specific domains, deploy on-premises for sensitive data, or operate across clouds. Applications include live coding assistants, instant large document Q&A, summarization, and agentic research chains, with reduced wait times compared to proprietary models on GPUs.

Dmitry Pimenov, product lead at OpenAI, stated that the open-weight model allows developers and enterprises to customize and deploy AI on their infrastructure, supporting innovation and scalability through partners like Cerebras. Andrew Feldman, CEO and co-founder of Cerebras, noted that the deployment offers high performance, cost efficiency, and ease of use for the AI community.

Developers and enterprises can access gpt-oss-120B on the Cerebras Cloud with a free API key at cerebras.ai/openai.

Explore more

Advanced Semiconductor+ Explore Sensors+ Explore Automotive+ Explore

More from AI →

Longsys Hits One Million Units in Monthly mSSD Production, Boosting Edge AI Storage Rollout

OpenAI, Broadcom Introduce New Intelligence Processor Built for LLM Inference

OpenAI and Broadcom have introduced Jalapeño, OpenAI's first Intelligence Processor, an accelerator…

Qualcomm Unveils Data Center Strategy, Eyes Multiple Growth Inflection Points Over Next 3-5 Years

At its 2026 Investor Day, Qualcomm laid out an accelerated diversification strategy…