Anti-aliased convolutional neural networks (CNNs) are models that introduce blur filters to intermediate representations of CNNs to achieve high accuracy in image recognition tasks. A promising way to prepare a… Click to show full abstract
Anti-aliased convolutional neural networks (CNNs) are models that introduce blur filters to intermediate representations of CNNs to achieve high accuracy in image recognition tasks. A promising way to prepare a new anti-aliased CNN is to introduce blur filters to the intermediate representations of pre-trained (non anti-aliased) CNNs, since many researchers have released them online. Although this scheme can build the new anti-aliased CNN easily, the blur filters drastically degrade the pre-trained representations. Therefore, to take full advantage of the benefits of introducing blur filters, fine-tuning using massive amounts of training data is often required. This can be problematic because the training data is often limited. In such a “data-limited” situation, the fine-tuning does not bring about a high performance because it induces overfitting to the limited training data. To tackle this problem, we propose “knowledge transferred fine-tuning.” Knowledge transfer is a technique that utilizes the representations of a pre-trained model to help ensure generalization in data-limited situations. Inspired by this concept, we transfer knowledge from intermediate representations in a pre-trained CNN to an anti-aliased CNN while fine-tuning. The key idea of our method is to transfer only the essential knowledge for image recognition in the pre-trained CNN using two types of loss: pixel-level loss and global-level loss. The former loss transfers the detailed knowledge from the pre-trained CNN, but this knowledge may contain “aliased” non-essential knowledge. The latter loss, on the other hand, is designed to increase when the pixel-level loss transfers non-essential knowledge while ignoring the essential knowledge, i.e., it penalizes the pixel-level loss. Experimental results demonstrate that the proposed method using just 25 training images per class on ImageNet 2012 can achieve higher accuracy than a conventional pre-trained CNN.
               
Click one of the above tabs to view related content.