IBM Dharmendra's team develop 25x better AI neural inference chip called NorthPole

Date: 30/10/2023
IBM Research's newest prototype AI chip called NorthPole based on a new chip architecture is faster and energy-efficient compared to most advanced processors chips today. After two decades of research, the prototype AI chip made by IBM’s lab in California is said to be a drastic shift from past von Neumann bottlenecks, where processor and the memory were discrete. IBM developing this while there is massive adoptation of AI in computing. NorthPole has the option of customizing the bit precision as needed, which allows for optimization of the power usage diffrernt from analog in-memory computing.

IBM reported it tested this chip on the popular ResNet-50 image recognition and YOLOv4 object detection models, where IBM claims this device delivered higher energy efficiency, higher space efficiency, and lower latency than any other chip currently in the market, and is roughly 4,000 times faster than TrueNorth. TrueNorth is the last brain-inspired chip that IBM has worked on prior to 2014. The results from NorthPole chips were published in Science.

The team headed by Dharmendra Modha inspired from how the brain computes. “Architecturally, NorthPole blurs the boundary between compute and memory,” Modha said. "At the level of individual cores, NorthPole appears as memory-near-compute and from outside the chip, at the level of input-output, it appears as an active memory.”

Source: Modha's blog
Over the last eight years, Modha has been working on a new type of digital AI chip for neural inference NorthPole, an extension of TrueNorth.

IBM said this 12 nm fabricated NorthPole is lot more efficient than other popular 12-nm GPUs and 14-nm CPUs. NorthPole is said to be 25x more energy efficient in handling number of frames interpreted per joule of power required. NorthPole also performed well in latency and physical size in terms of frames interpreted per second per billion transistors required.

Modha finds NorthPole on ResNet-50 beaten all major prevalent architectures in performance including advanced GPUs made in 4 nm process.

Source: Modha's blog

12nm made NorthPole with 22 billion transistors in 800 square millimeters integrated all the memory on-chip with no data going in and out to store and fetch getting rid of von Neumann bottleneck. It is entire network on a chip. NorthPole features 256 cores, where each core can deliver 2,048 operations per core per cycle at 8-bit precision, with potential to double and quadruple the number of operations with 4-bit and 2-bit precision, respectively.

By using an approach called scale-out, NorthPole can actually support larger neural networks by breaking them down into smaller sub-networks that fit within NorthPole’s model memory, and connecting these sub-networks together on multiple NorthPole chips.

Pic: The NorthPole chip on a PCIe card
Source: IBM

To learn more on this visit: https://research.ibm.com/blog/northpole-ibm-ai-chip

Further references:
NorthPole: Neural Inference at the Frontier of Energy, Space, and Time

Neural inference at the frontier of energy, space, and time

NorthPole: Neural Inference at the Frontier of Energy, Space, and Time

Follow @eeherald