Abstract Mid-level element based representations have been proven to be very effective for visual recognition. This paper presents a method to discover discriminative mid-level visual elements based on deep Convolutional… Click to show full abstract
Abstract Mid-level element based representations have been proven to be very effective for visual recognition. This paper presents a method to discover discriminative mid-level visual elements based on deep Convolutional Neural Networks (CNNs). We present a part-level CNN architecture, namely Part-based CNN (P-CNN), which acts as a role of encoding module in a part-based representation model. The P-CNN can be attached at arbitrary layer of a pre-trained CNN and be trained using image-level labels. The training of P-CNN essentially corresponds to the optimization and selection of discriminative mid-level visual elements. For an input image, the output of P-CNN is naturally the part-based coding and can be directly used for image recognition. By applying P-CNN to multiple layers of a pre-trained CNN, more diverse visual elements can be obtained for visual recognitions. We validate the proposed P-CNN on several visual recognition tasks, including scene categorization, action classification and multi-label object recognition. Extensive experiments demonstrate the competitive performance of P-CNN in comparison with state-of-the-arts.
               
Click one of the above tabs to view related content.