OSAT

Tachyum demonstrates enhanced reliability with DRAM failover on prodigy universal processor for large-scale AI and HPC applications

Listen to this story

AI NARRATED
0:00 / 0:00
Tachyum announced a significant advancement in memory error correction technology with the successful demonstration of DRAM Failover on its Prodigy Universal Processor. This breakthrough enhances reliability for large-scale AI and high-performance computing (HPC) applications, even in the event of DRAM chip failures.
 
Tachyum’s DRAM Failover technology offers superior memory error correction, providing a higher level of protection than traditional Error Correction Code (ECC). It can correct multi-bit errors within a single memory chip or across multiple chips, ensuring continued memory operation despite device-level faults. This capability allows systems to tolerate complete DRAM chip failures without impacting performance or reliability. 

As AI clusters scale to include up to 100,000 accelerators, the time between failures can be as short as a few hours, presenting a significant reliability challenge. Tachyum’s DRAM Failover addresses this by preserving customer data and maintaining system availability, making it ideal for HPC systems and high-end servers with large memory capacities.

A single Prodigy processor can support 640 or 1280 DRAM chips, translating to 64,000,000 DRAM chips at scale. With DRAM Failover, a failing DRAM die per DIMM will not affect system operation, unlike GPU accelerators. This validation underscores Tachyum’s commitment to robust Reliability, Accessibility, and Serviceability (RAS) features, catering to the market’s interest in large-scale AI, including Cognitive AI and Artificial General Intelligence (AGI).

“This capability is essential to increase the scale of AI training as it moves from Large Language Models and Generative AI to much bigger systems needed for Cognitive AI and AGI,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “The importance of using DRAM Failover on Tachyum’s platform will be even more evident as we increase memory capacity per Prodigy processor with every generation.”

AI innovator DeepSeek is an example of how DRAM capacity can be leveraged over bandwidth, making it a compelling use case for Prodigy. DeepSeek’s efficiency, akin to the human brain’s selective neuron firing, highlights the benefits of larger DRAM capacity without reliability challenges, further establishing Prodigy’s advantages.

As a Universal Processor offering industry-leading performance for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 256 high-performance custom-designed 64-bit compute cores to deliver up to 18x the highest performing GPU for AI applications, 3x the performance of the highest-performing x86 processors for cloud workloads, and up to 8x that of the highest performing GPU for HPC.

For more information, visit www.tachyum.com.

More from OSAT