This work introduces a digital SRAM-based near-memory compute macro for DNN inference, improving on-chip weight memory capacity and area efficiency compared to state-of-the-art digital computing-in-memory (CIM) macros. A $20\times 256.1$… Click to show full abstract
This work introduces a digital SRAM-based near-memory compute macro for DNN inference, improving on-chip weight memory capacity and area efficiency compared to state-of-the-art digital computing-in-memory (CIM) macros. A $20\times 256.1$ -16b reconfigurable digital computing near-memory (NM) macro is proposed, supporting a reconfigurable 1-16b precision through the bit-serial computing scheme and the weight and input gating architecture for sparsity-aware operations. Each reconfigurable column MAC comprises $16\times $ custom-designed 7T SRAM bitcells to store 1-16b weights, a conventional 6T SRAM for zero weight skip control, a bitwise multiplier, and a full adder with a register for partial-sum accumulations. $20\times $ parallel partial-sum outputs are post-accumulated to generate a sub-partitioned output feature map, which will be concatenated to produce the final convolution result. Besides, pipelined array structure improves the throughput of the proposed macro. The proposed near-memory computing macro implements an 80Kb binary weight storage in a 0.473mm2 die area using 65nm. It presents the area/energy efficiency of 4329-270.6 GOPS/mm2 and 315.07-1.23TOPS/W at 1-16b precision.
Share on Social Media:
  
        
        
        
Sign Up to like & get recommendations! 2
Related content
More Information
            
News
            
Social Media
            
Video
            
Recommended
               
Click one of the above tabs to view related content.