ADVERTISEMENT
Advertisement
AI

Cerebras Enables OpenAI’s gpt-oss-120B Model with Record-Breaking AI Inference Speeds

Listen to this story

AI NARRATED
0:00 / 0:00

Cerebras Systems has announced support for OpenAI’s first open-weight reasoning model, gpt-oss-120B, on its AI Inference Cloud, achieving an output speed of 3,000 tokens per second. The 120B-parameter model, designed for complex tasks in math, science, and code, matches the intelligence of proprietary models like Gemini 2.5 Flash and Claude Opus 4. It runs on Cerebras’ wafer-scale AI infrastructure, which eliminates GPU memory bandwidth bottlenecks and communication overhead.

ADVERTISEMENT
Advertisement

The collaboration allows developers to integrate gpt-oss-120B into existing OpenAI endpoints in 15 seconds without refactoring or migration. The model, licensed under Apache 2.0, enables users to fine-tune for specific domains, deploy on-premises for sensitive data, or operate across clouds. Applications include live coding assistants, instant large document Q&A, summarization, and agentic research chains, with reduced wait times compared to proprietary models on GPUs.

Dmitry Pimenov, product lead at OpenAI, stated that the open-weight model allows developers and enterprises to customize and deploy AI on their infrastructure, supporting innovation and scalability through partners like Cerebras. Andrew Feldman, CEO and co-founder of Cerebras, noted that the deployment offers high performance, cost efficiency, and ease of use for the AI community.

ADVERTISEMENT
Advertisement

Developers and enterprises can access gpt-oss-120B on the Cerebras Cloud with a free API key at cerebras.ai/openai.


More from AI