Although the existing single-instruction–multiple-data-like (SIMD) accelerators can handle the compressed format of sparse convolutional neural networks, the sparse and irregular distributions of nonzero elements cause low utilization of multipliers in… Click to show full abstract
Although the existing single-instruction–multiple-data-like (SIMD) accelerators can handle the compressed format of sparse convolutional neural networks, the sparse and irregular distributions of nonzero elements cause low utilization of multipliers in a processing engine (PE) and imbalanced computation between PEs. This brief addresses the above issues by proposing a data screening and task mapping (DSTM) accelerator which integrates a series of techniques, including software refinement and hardware modules. An efficient indexing module is introduced to identify the effectual computation pairs and skip unnecessary computation in a fine-grained manner. The intra-PE load imbalance is alleviated with weight data rearrangement. An effective task sharing mechanism further balances the computation between PEs. When compared with the state-of-the-art SIMD-like accelerator, the proposed DSTM enhances the average PE utilization by $3.5\times $ . The overall processing throughput is 59.7% higher than the previous design.
               
Click one of the above tabs to view related content.