LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

NeuRes: Highly Activated Neurons Responses Transfer via Distilling Sparse Activation Maps

Photo from wikipedia

In recent years, Knowledge Distillation has obtained a significant interest in mobile, edge, and IoT devices due to its ability to transfer knowledge from the large and complex teacher to… Click to show full abstract

In recent years, Knowledge Distillation has obtained a significant interest in mobile, edge, and IoT devices due to its ability to transfer knowledge from the large and complex teacher to the lightweight student network. Intuitively, Knowledge Distillation refers to forcing the student to mimic the teacher’s neuron responses to improve the generalization of the student by deploying the distillation losses as the regularization terms. However, the non-linearity of the hidden layers and the high dimensionality of the feature maps make the knowledge transfer a rigorous task. Though numerous methods have been proposed to transfer the teacher’s neuron responses in the form of diverse feature characteristics such as attention, contrastive representation, and so on, to the best of our knowledge, no prior works considered feature-level non-linearity during distillation. In this work, we ask, does feature-level non-linearity-based approaches can improve student performance? For investigating those concerns, we propose a novel knowledge distillation technique called the NeuRes (Neuron’s Responses) via distilling the Sparse Activation Maps (SAMs) to transfer the highly activated Neurons Responses to the student to enhance the representation capability. Proposed NeuRes selects the highly activated neuron responses that produce Sparse Activation Maps (SAMs) while transferring the knowledge based on activation normalization. Our proposed NeuRes also transfers the translation invariant features using auxiliary classifiers and augmented data to improve students’ generalization. The detailed ablation studies and extensive experiments on model compression, transferability, adversarial robustness, and few-shot learning verify that NeuRes outperforms state-of-the-art distillation techniques on the standard benchmark datasets.

Keywords: distillation; sparse activation; highly activated; student; activation maps; activation

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.