The emerging edge intelligence requires low-cost energy-efficient neural network (NN) processors. Supporting various types of edge NN models leads to extra circuit overhead. Designing a unified NN processor with high… Click to show full abstract
The emerging edge intelligence requires low-cost energy-efficient neural network (NN) processors. Supporting various types of edge NN models leads to extra circuit overhead. Designing a unified NN processor with high energy/area efficiency is challenging. This work presents a frequency-domain-accelerated unified NN processor, named STICKER-T. It combines algorithm, architecture, and circuit-level optimization to achieve high energy/area efficiency. By utilizing the block-circulant NN (CirCNN) algorithm, this work supports frequency-domain acceleration and a unified workflow for convolutional, fully connected, and recurrent NN (CNN/FC/RNN). Three key innovations are proposed. First, a block-circulant-accelerated chip architecture is implemented to support unified CNN/FC/RNN workflow. Second, a multi-bit 8-128-point global-parallel local-bit-serial fast Fourier transform (FFT) module is designed for efficient high-throughput FFT/inverse FFT (IFFT) operation. Third, by utilizing a 6T hierarchical-bitline-switching transpose-SRAM (HBST-TRAM), 2-D data reuse is enabled in the proposed multi-bit frequency-domain multiply–accumulate (MAC) array. STICKER-T was fabricated in a 65-nm CMOS technology. It can operate at 0.54–1.15 V and 25–200 MHz with 13.3–339-mW power consumption. The peak energy efficiency achieves 140.3 TOPS/W. It shows 8.1 $\times $ area efficiency and 4.2 $\times $ energy efficiency at 4-bit precision compared with the state-of-the-art reconfigurable NN processor.
               
Click one of the above tabs to view related content.