Compute-in-memory (CIM) has gained prominence as a promising hardware architecture for machine-learning accelerators (MLAs) within the landscape of intelligent sensors (ISs). The acceleration of deep neural networks (DNNs) by MLAs… Click to show full abstract
Compute-in-memory (CIM) has gained prominence as a promising hardware architecture for machine-learning accelerators (MLAs) within the landscape of intelligent sensors (ISs). The acceleration of deep neural networks (DNNs) by MLAs highlights the need for improved energy efficiency. In recent years, CIM-aware DNN model compression techniques, such as low-precision quantization, have been extensively investigated to enhance the energy efficiency of tiny machine-learning (TinyML) models for edge devices. However, existing approaches primarily focus on the posttraining compression of pretrained models and overlook the energy consumption during compression-aware training. In this article, we propose a hamming weight (HW)-based quantization framework, named HamQ, to enhance the energy efficiency of analog CIM. A key contribution of this work is in a novel regularizer to reduce HW of quantized model weights, thereby implementing the crossbar with a lesser amount of ON bit-cells. This constraint results in lower bitline currents in crossbar arrays, which are often a major energy overhead in analog CIM accelerators. We analytically prove that HamQ evolves the probabilistic density of model weights to be high in low HW ranges while making it low in high HW ranges. Our method is evaluated on image classification and keyword spotting (KWS) tasks with TinyML models. Simulation results illustrate that, in comparison to models without regularization, HamQ reduces per-inference energy consumption by 54.0% with a marginal accuracy degradation of 1.5% for the 8-bit ResNet-18 model in CIFAR-10 image classification and by 42.7% with a 3.5% degradation for the 6-bit DS-CNN model in the KWS task.
               
Click one of the above tabs to view related content.