To achieve high energy efficiency for edge processing, we implement an all-digital shuffle-type group CNN accelerator with binary weights, which is the first reported shuffle-type GCNN accelerator in CMOS as… Click to show full abstract
To achieve high energy efficiency for edge processing, we implement an all-digital shuffle-type group CNN accelerator with binary weights, which is the first reported shuffle-type GCNN accelerator in CMOS as far as we know. Cross-level optimizations from architecture to hardware level are proposed. The NN architecture is optimized to obtain a hardware-friendly configuration. For hardware-level optimizations, we propose a three-level storage hierarchy with latch-based computing-near-memory to reduce the total power by 25.8%. We also design a novel shuffle hardware to reduce memory access in the original shuffle to 1/3 and a stage-by-stage gated MUX to reduce the power consumption in the original selector circuit to 29%. Fabricated in a 28nm CMOS process, the chip works at 0.48-0.9 V with the best energy efficiency of 1.23 μJ/inf at 0.552 V and 66 MHz with 85.8% accuracy on CIFAR-10 dataset. Its energy efficiency is 1.9- $34.8\mathbf {\times }$ better than state-of-the-art works.
               
Click one of the above tabs to view related content.