Convolutional layers dominate the computation and energy costs of Deep Neural Network (DNN) inference. Recent algorithmic works attempt to reduce these bottlenecks via compact DNN structures and model compression. Likewise,… Click to show full abstract
Convolutional layers dominate the computation and energy costs of Deep Neural Network (DNN) inference. Recent algorithmic works attempt to reduce these bottlenecks via compact DNN structures and model compression. Likewise, state-of-the-art accelerator designs leverage spatiotemporal characteristics of convolutional layers to reduce data movement overhead and improve throughput. Although both are independently effective at reducing latency and energy costs, combining these approaches does not guarantee cumulative improvements due to inefficient mapping. This inefficiency can be attributed to (1) inflexibility of underlying hardware and (2) inherent reduction of data-reuse opportunities of compact DNN structures. To address these issues, we propose a dynamically reshaping, high data-reuse PE array accelerator, namely
               
Click one of the above tabs to view related content.