LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Network Pruning for Bit-Serial Accelerators

Photo from wikipedia

Bit-serial architectures (BSAs) are becoming increasingly popular in low-power neural network processor (NNP) designs for edge scenarios. However, the performance and energy efficiency of state-of-the-art BSA NNPs heavily depends on… Click to show full abstract

Bit-serial architectures (BSAs) are becoming increasingly popular in low-power neural network processor (NNP) designs for edge scenarios. However, the performance and energy efficiency of state-of-the-art BSA NNPs heavily depends on both the proportion and distribution of ineffectual weight bits in neural networks (NNs). To boost the performance of typical BSA accelerators, we present Bit-Pruner, a software approach to learn BSA-favored NNs without resorting to hardware modifications. Bit-Pruner not only progressively prunes but also restructures the nonzero bits in weights so that the number of nonzero bits in the model can be reduced and the corresponding computing can be load-balanced to suit the target BSA accelerators. On top of Bit-Pruner, we further propose a Pareto frontier optimization algorithm to adjust the bit-pruning rate across network layers and fulfill diverse NN processing requirements in terms of performance and accuracy for various edge scenarios. However, an aggressive Bit-Pruner can lead to nontrivial accuracy loss, especially for lightweight NNs and complex tasks. To this end, the alternating direction method of multipliers (ADMMs) is adapted to the retraining phase in Bit-Pruner to smooth the abrupt disturbance due to bit-pruning and enhance the resulting model accuracy. According to the experiments, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation and achieves an optimized tradeoff between NN accuracy and energy efficiency even under very-aggressive performance constraints. When pruned models are deployed onto typical BSA accelerators, the average performance is $2.1\times $ and $1.6\times $ higher than the baseline networks without pruning and those with classical weight pruning, respectively.

Keywords: network; bit; bit serial; accuracy; bit pruner; performance

Journal Title: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.