Deep learning models have evolved into powerful tools that can be used for many artificial intelligence tasks. However, deploying deep neural networks into real-world applications is still challenging due to… Click to show full abstract
Deep learning models have evolved into powerful tools that can be used for many artificial intelligence tasks. However, deploying deep neural networks into real-world applications is still challenging due to their high computational complexity and storage overhead. In this paper, we propose an interactive neural network compression mechanism including out-in-channel pruning and neural network quantization. Many channel pruning works apply structured sparsity regularization on each layer separately. We consider correlations between successive layers to retain predictive power of the compact network. A global greedy pruning algorithm is designed to remove redundant out-in-channels in an iterative way. Moreover, in order to solve the shortcomings of the one-shot quantization, we propose the incremental quantization algorithm in the dimension of the output channel, which can smooth network fluctuations and recover accuracy better during retraining. Our mechanism is comprehensively evaluated with various Convolutional Neural Networks (CNN) architectures on popular datasets. Notably, on ImageNet-1K, the out-in-channel pruning reduce 54.0% FLOPS on AlexNet and 50.0% FLOPs on ResNet-50 with only 0.15% and 0.37% top-1 accuracy drop respectively. On classification and style transfer tasks, the superiority of incremental quantization increases with the decrease of the number of quantization bits.
               
Click one of the above tabs to view related content.