Zero-shot learning (ZSL) for visual recognition aims to accurately recognize the objects of unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. However,… Click to show full abstract
Zero-shot learning (ZSL) for visual recognition aims to accurately recognize the objects of unseen classes through mapping the visual feature to an embedding space spanned by class semantic information. However, the semantic gap across visual features and their underlying semantics is still a big obstacle in ZSL. Conventional ZSL methods construct that the mapping typically focus on the original visual features that are independent of the ZSL tasks, thus degrading the prediction performance. In this paper, we propose an effective method to uncover an appropriate latent representation of data for the purpose of zero-shot classification. Specifically, we formulate a novel framework to jointly learn the latent subspace and cross-modal embedding to link visual features with their semantic representations. The proposed framework combines feature learning and semantics prediction, such that the learned data representation is more discriminative to predict the semantic vectors, hence improving the overall classification performance. To learn a robust latent subspace, we explicitly avoid the information loss by ensuring the reconstruction ability of the obtained data representation. An efficient algorithm is designed to solve the proposed optimization problem. To fully exploit the intrinsic geometric structure of data, we develop a manifold regularization strategy to refine the learned semantic representations, leading to further improvements of the classification performance. To validate the effectiveness of the proposed approach, extensive experiments are conducted on three ZSL benchmarks and encouraging results are achieved compared with the state-of-the-art ZSL methods.
               
Click one of the above tabs to view related content.