New Industry Products

AI Accelerator with Processing-in-Memory Architecture Claims Highest Power Efficiency

June 13, 2019 by Scott McMahan

Renesas Electronics Corporation reported that it developed an AI accelerator that performs CNN (convolutional neural network) processing at high speeds and low power. The company intends to use the technology to move towards its next generation of embedded AI (e-AI), which it hopes will help accelerate the increased intelligence of endpoint devices.

A Renesas test chip featuring this accelerator achieved the power efficiency of 8.8 TOPS/W, which the company describes as the industry's highest class of power efficiency.

The Renesas based the AI accelerator on the processing-in-memory (PIM) architecture, an increasingly popular strategy for AI technology in which multiply-and-accumulate operations are done in the memory circuit as data is read from that memory.

Three of the company's newly developed technologies went into AI accelerator.

The first is a ternary-valued (-1, 0, 1) SRAM structure PIM technology which can do large-scale CNN computations. The second is an SRAM circuit to be applied with comparators that can read out memory data at low power. The third technology prevents calculation errors resulting from process variations in manufacturing. Combined, Renesas used these technologies to achieve both a reduction in the memory access time in deep learning processing and a reduction in the power required for multiply-and-accumulate operations.

The new accelerator maintaining an accuracy ratio of more than 99% when evaluated in a handwritten character recognition test (MNIST).

Renesas presented these results on June 13, at the 2019 Symposia on VLSI Technology and Circuits in Kyoto, Japan, June 9-14, 2019. The company also demonstrated real-time image recognition. The image recognition used a prototype AI module in which this test chip, powered by a small battery, was connected with a microcontroller, a camera, other peripheral devices, and development tools.

Previously, the PIM architecture was unable to achieve an adequate accuracy level for large-scale CNN computations with single-bit calculations because the binary (0,1) SRAM structure could only handle data with values 0 or 1.

Furthermore, manufacturing process variations resulted in a reduction in the reliability of the calculations, and workarounds were required.

Renesas has now developed technologies that resolve these issues in calculation reliability and will be applying them, as a leading-edge technology for implementing revolutionary AI chips of the future in next generation e-AI solutions for applications such as wearable equipment and robots that require both performance and power efficiency.

Key Features of the Newly Developed Technology

A ternary (-1, 0, 1) SRAM structure PIM that can adjust its calculation bit number according to the accuracy required

The ternary (-1, 0, 1) SRAM structure PIM architecture adopts a combination of a ternary memory with a simple digital calculation block to minimize increases in the amount of hardware and keep calculation errors to a minimum.

At the same time, the ternary (-1, 0, 1) SRAM structure PIM allows switching the number of bits between, for example, 1.5-bit (ternary) and 4-bit calculations depending on the required accuracy.

Also, this supports different required accuracies and calculation scales. So, it allows optimization of the balance between accuracy and power consumption.

The device's high-precision/low-power memory data readout circuit combines comparators and replica cells In a PIM architecture scheme, memory data is read out by detecting the value of the bit line current in the SRAM structure. Although it is effective to use ADCs for high-precision bit line current detection, this approach results in high-power consumption and increased chip areas.

In this application, Renesas combined a comparator (1-bit sense amplifier) with a replica cell in which the current can be flexibly controlled to develop a high-precision memory data readout circuit.

Furthermore, these technologies take advantage of the small number of nodes (neurons) activated by neural network operation, about 1%. It achieves even lower power operation by stopping the operation of the readout circuits for unactivated nodes (neurons).

New variation avoidance technology suppresses calculation errors due to process variations in manufacturing. The PIM architecture has the issue of calculation errors resulting from process variations in manufacturing. These process variations in the manufacturing cause errors in the values of the bit line currents within the SRAM structure. So, these bit line current value errors result in errors in the memory data readout.

To resolve this problem, Renesas covered the inside of the chip with multiple SRAM calculation circuit blocks and employed blocks with minimal manufacturing process variations to perform the calculations.

Since the activated nodes are only a small minority of all nodes (about 1%), activated nodes are allocated selectively to SRAM calculation circuit blocks that have minimal manufacturing process variations.

These SRAM calculation circuit blocks with minimal manufacturing variation perform the calculations, allowing the reduction in calculation errors to a level where they can be essentially ignored.

Since it introducing the embedded AI (e-AI) concept in 2015, Renesas has advanced the development of several e-AI solutions. The company defined classes based on the effectiveness of e-AI and implemented, and has been developing e-AI solutions based on the four classes.

Class 1: Judging the correctness or abnormality of signal waveform data.
Class 2: (100 GOPS/W class): Judging correctness or abnormality using real-time image processing.
Class 3: (1 TOPS/W class): Performing recognition in real time.
Class 4: (10 TOPS/W class): Enabling incremental learning at an endpoint.

Renesas first introduced an e-AI development environment in 2017, and in 2018 announced the RZ/A2M microprocessor, which integrates its exclusive DRP (dynamically reconfigurable processor) on a chip. Renesas provides these technologies for applications classified through class 2.

However, for class 3 applications, Renesas has further enhanced the computational performance of this DRP technology.

Renesas says that the newly unveiled technology could be one of the key technologies to implement future class 4 applications.