Filter pruning is advocated for accelerating deep neural networks without dedicated hardware or libraries, while maintaining high prediction accuracy. Several works have cast pruning as a variant of l1 -regularized… Click to show full abstract
Filter pruning is advocated for accelerating deep neural networks without dedicated hardware or libraries, while maintaining high prediction accuracy. Several works have cast pruning as a variant of l1 -regularized training, which entails two challenges: 1) the l1 -norm is not scaling-invariant (i.e., the regularization penalty depends on weight values) and 2) there is no rule for selecting the penalty coefficient to trade off high pruning ratio for low accuracy drop. To address these issues, we propose a lightweight pruning method termed adaptive sensitivity-based pruning (ASTER) which: 1) achieves scaling-invariance by refraining from modifying unpruned filter weights and 2) dynamically adjusts the pruning threshold concurrently with the training process. ASTER computes the sensitivity of the loss to the threshold on the fly (without retraining); this is carried efficiently by an application of L-BFGS solely on the batch normalization (BN) layers. It then proceeds to adapt the threshold so as to maintain a fine balance between pruning ratio and model capacity. We have conducted extensive experiments on a number of state-of-the-art CNN models on benchmark datasets to illustrate the merits of our approach in terms of both FLOPs reduction and accuracy. For example, on ILSVRC-2012 our method reduces more than 76% FLOPs for ResNet-50 with only 2.0% Top-1 accuracy degradation, while for the MobileNet v2 model it achieves 46.6% FLOPs Drop with a Top-1 Acc. Drop of only 2.77%. Even for a very lightweight classification model like MobileNet v3-small, ASTER saves 16.1% FLOPs with a negligible Top-1 accuracy drop of 0.03%.
               
Click one of the above tabs to view related content.