Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on… Click to show full abstract
Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.
               
Click one of the above tabs to view related content.