LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

Photo by mbrunacr from unsplash

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so… Click to show full abstract

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks (TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing noncomputational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by: 1) a data-path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data reuse; 2) targeting TNNs which, in contrast to binary NNs, allow for sparse weights that reduce switching activity; and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while decreasing the overall core inference energy cost by a factor of $4.8\times $ $21\times $ .

Keywords: energy; switching activity; tex math; inference; inline formula

Journal Title: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.