LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Progressive Network Grafting With Local Features Embedding for Few-Shot Knowledge Distillation

Photo from wikipedia

Compared with traditional knowledge distillation, which relies on a large amount of data, few-shot knowledge distillation can distill student networks with good performance using only a small number of samples.… Click to show full abstract

Compared with traditional knowledge distillation, which relies on a large amount of data, few-shot knowledge distillation can distill student networks with good performance using only a small number of samples. Some recent studies treat the network as a combination of a series of network blocks, adopt a progressive graft strategy, and use the output of the teacher network to distill the student network. However, this strategy ignores the importance of the local feature information generated by the teacher block, which indicates what features should be learned by the corresponding student block. In this paper, we argue that using the features output from the teacher block can guide the student block to further learn more useful information from the teacher block. Therefore, we propose a joint learning framework for few-shot knowledge distillation that exploits both the output of the teacher network and the local features generated by the teacher block to optimize the student network. The local features will guide the student block to learn the output of the teacher block, and the output of the teacher network will allow the student network to take its learned local features to better contribute to the classification. In addition, further model compression was carried out to design a series of student networks with fewer number of parameters by reducing the number of network channels. Finally, extensive experiments using the model on CIFAR10 and CIFAR100 datasets show that our method outperforms SOTA, and our method has considerable advantages even with a very small number of parameters in further model compression experiments.

Keywords: network; local features; knowledge distillation; block; student

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.