Accurately categorizing million-scale Internet users (e.g., Flickr or Google Picasa) into multiple communities based on their genre tastes is an indispensable techniquein machine learning and multimedia. It can facilitate a… Click to show full abstract
Accurately categorizing million-scale Internet users (e.g., Flickr or Google Picasa) into multiple communities based on their genre tastes is an indispensable techniquein machine learning and multimedia. It can facilitate a series of applications, such as fashion recommendation and 3D non-realistic photo rendering. Conventional methods cannot handle this task appropriately because of the inherent contaminated image labels, which are produced by auxiliary image label predictors. In this article, we propose a noise-tolerant deep architecture which optimally encodes stable templates,11Stable template is our proposed new concept. It denotes the distribution of co-occurring semantic categories toward an image set. Both theoretical and empirical analysises have demonstrated that such distribution (i.e. stable templates) remains almost unchanged in the presence of contaminated image tags. discovered from a collection of images with contaminated semantic labels. Specifically, we first construct a semantic space by encoding image labels using manifold embedding. Afterward, we observe that in the semantic space, the distribution of superpixels from images with the same label remains stable, regardless of the noises from image labels. According to this observation, a probabilistic generative model (Hidden Stable Analysis) is proposed to learn the stable templates toward each image label. To globally represent the composition of a user's images, a deep aggregation network is developed which statistically concatenates the CNN features learned from all its generated stable templates. Subsequently, an affinity graph is built, in which the genre difference among users is determined by their deep features. Finally, we employ a dense subgraph discovery technique which effectively mines the communities toward various genre tastes. Experiments on a million-scale image set ($>$1.4 million) compiled from Flickr have demonstrated the effectiveness of our method. Additionally, empirical study on the 33 SIFT-flow categories have shown that the detected stable templates maintain almost unchanged under nearly 32% contaminated image labels.
               
Click one of the above tabs to view related content.