Multi-label image classification aims to predict the labels associated with a given image. While most existing methods utilize unified image representations, extracting label-specific features through input space learning would improve… Click to show full abstract
Multi-label image classification aims to predict the labels associated with a given image. While most existing methods utilize unified image representations, extracting label-specific features through input space learning would improve the discriminative power of the learned features. On the other hand, most feature learning studies often ignore the learning in the output label space, although taking advantage of label correlations can boost the classification performance. In this paper, we propose a deep learning framework that incorporates flexible modules which can learn from both input and output spaces for multi-label image classification. For the input space learning, we devise a label-specific feature pooling method to refine convolutional features for obtaining features specific to each label. For the output space learning, we design a Two-Stream Graph Convolutional Network (TSGCN) to learn multi-label classifiers by mapping spatial object relationships and semantic label correlations. More specifically, we build object spatial graphs to characterize the spatial relationships among objects in an image, which supplements the label semantic graphs modelling the semantic label correlations. Experimental results on two popular benchmark datasets (i.e., Pascal VOC and MS-COCO) show that our proposed method achieves superior performance over the state-of-the-arts.
               
Click one of the above tabs to view related content.