Cerebras Systems has announced support for OpenAI’s first open-weight reasoning model, gpt-oss-120B, on its AI Inference Cloud, achieving an output speed of 3,000 tokens per second. The 120B-parameter model, designed for complex tasks in math, science, and code, matches the intelligence of proprietary models like Gemini 2.5 Flash and Claude Opus 4. It runs on Cerebras’ wafer-scale AI infrastructure, which eliminates GPU memory bandwidth bottlenecks and communication overhead.
The collaboration allows developers to integrate gpt-oss-120B into existing OpenAI endpoints in 15 seconds without refactoring or migration. The model, licensed under Apache 2.0, enables users to fine-tune for specific domains, deploy on-premises for sensitive data, or operate across clouds. Applications include live coding assistants, instant large document Q&A, summarization, and agentic research chains, with reduced wait times compared to proprietary models on GPUs.
Dmitry Pimenov, product lead at OpenAI, stated that the open-weight model allows developers and enterprises to customize and deploy AI on their infrastructure, supporting innovation and scalability through partners like Cerebras. Andrew Feldman, CEO and co-founder of Cerebras, noted that the deployment offers high performance, cost efficiency, and ease of use for the AI community.
Developers and enterprises can access gpt-oss-120B on the Cerebras Cloud with a free API key at cerebras.ai/openai.






