Convolutional Neural Networks (CNN) are widely adopted for Machine Learning (ML) tasks, such as classification and computer vision. GPUs became the reference platforms for both training and inference phases of… Click to show full abstract
Convolutional Neural Networks (CNN) are widely adopted for Machine Learning (ML) tasks, such as classification and computer vision. GPUs became the reference platforms for both training and inference phases of CNNs due to their tailored architecture to the CNN operators. However, GPUs are power-hungry architectures. A path to enable the deployment of CNNs in energy-constrained devices is adopting hardware accelerators for the inference phase. However, the literature presents gaps regarding analyses and comparisons of these accelerators to evaluate Power-Performance-Area (PPA) trade-offs. Typically, the literature estimates PPA from the number of executed operations during the inference phase, such as the number of MACs, which may not be a good proxy for PPA. Thus, it is necessary to deliver accurate hardware estimations, enabling design space exploration (DSE) to deploy CNNs according to the design constraints. This work proposes a fast and accurate DSE approach for CNNs using an analytical model fitted from the physical synthesis of hardware accelerators. The model is integrated with CNN frameworks, like TensorFlow, to generate accurate results. The analytic model estimates area, performance, power, energy, and memory accesses. The observed average error comparing the analytical model to the data obtained from the physical synthesis is smaller than 7%.
               
Click one of the above tabs to view related content.