LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

Photo from wikipedia

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful sequence modeling capability. However, RNNs usually require a large number of parameters… Click to show full abstract

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful sequence modeling capability. However, RNNs usually require a large number of parameters and high computational complexity. Hence, it is quite challenging to implement complex RNNs on embedded devices with stringent memory and latency requirement. In this paper, we first present a novel hybrid compression method for a widely used RNN variant, long–short term memory (LSTM), to tackle these implementation challenges. By properly using circulant matrices, forward nonlinear function approximation, and efficient quantization schemes with a retrain-based training strategy, the proposed compression method can reduce more than 95% of memory usage with negligible accuracy loss when verified under language modeling and speech recognition tasks. An efficient scalable parallel hardware architecture is then proposed for the compressed LSTM. With an innovative chessboard division method for matrix–vector multiplications, the parallelism of the proposed hardware architecture can be freely chosen under certain latency requirement. Specifically, for the circulant matrix–vector multiplications employed in the compressed LSTM, the circulant matrices are judiciously reorganized to fit in with the chessboard division and minimize the number of memory accesses required for the matrix multiplications. The proposed architecture is modeled using register transfer language (RTL) and synthesized under the TSMC 90-nm CMOS technology. With 518.5-kB on-chip memory, we are able to process a $512 \times 512$ compressed LSTM in 1.71 $\mu {{{\text{s}}}}$ , corresponding to 2.46 TOPS on the uncompressed one, at a cost of 30.77-mm2 chip area. The implementation results demonstrate that the proposed design can achieve significantly high flexibility and area efficiency, which satisfies many real-time applications on embedded devices. It is worth mentioning that the memory-efficient approach of accelerating LSTM developed in this paper is also applicable to other RNN variants.

Keywords: neural networks; memory efficient; recurrent neural; lstm; memory; efficient approach

Journal Title: IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.