"TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization"

Computing-in-memory (CIM) is an attractive approach for energy-efficient deep neural network (DNN) processing, especially for low-power edge devices. However, today’s typical DNNs usually exceed CIM-static random access memory (SRAM) capacity. The introduced off-chip communication covers up the benefits of CIM technique, meaning that CIM processors still encounter the memory bottleneck. To eliminate this bottleneck, we propose a CIM processor, called TT@CIM, which applies the tensor-train decomposition (TTD) method to compress the entire DNN to fit within CIM-SRAM. However, the cost of storage reduction by TTD is to introduce multiple serial small-size matrix multiplications, resulting in massive inefficient multiply-and-accumulate (MAC) and quantization operations (QuantOps). To achieve high energy efficiency, three optimization techniques are proposed in TT@CIM. First, TTD-CIM-matched dataflow is proposed to maximize CIM utilization and minimize additional MAC operations. Second, a bit-level-sparsity-optimized CIM macro with high bit-level-sparsity encoding scheme is designed to reduce the power consumption of one MAC operation. Third, a variable precision quantization method and a lookup table-based quantization unit are presented to improve the performance and energy efficiency of QuantOp. Fabricated in 28-nm CMOS and tested on 4/8-bit decomposed DNNs, TT@CIM achieves 5.99-to-691.13-TOPS/W peak energy efficiency depending on the operating voltage.

Keywords: level sparsity; quantization; bit level; cim; memory

Journal Title: IEEE Journal of Solid-State Circuits
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended