Supervised cross-modal image-text hashing has aroused extensive concentrations in comprehending the correspondence between vision and language for data search tasks. Existing methods learn the compact hash codes by leveraging a… Click to show full abstract
Supervised cross-modal image-text hashing has aroused extensive concentrations in comprehending the correspondence between vision and language for data search tasks. Existing methods learn the compact hash codes by leveraging a given image-text data pairs or supervised information to explore such correspondence. However, they still confront obvious drawbacks. First, there is no engagement between multiple semantic information that yields the suboptimal search performance. Second, most of them adopt continuous relaxation strategy by discarding the discrete constraints, which results in large binary quantization errors. To deal with these problems, we propose a novel supervised hashing method, termed Discrete Joint Semantic Alignment Hashing (DJSAH). Specifically, it builds a connection between semantics (a.k.a. class labels and pairwise similarities) by the joint semantic alignment learning. And thus the high-level discriminative semantics can be preserved into the hash codes. Besides, a well-designed discrete optimization algorithm with linear computation and memory cost is developed to reduce the information loss of the hash codes with no need for relaxation. Extensive experiments and analyses on three benchmark datasets validate the superiority of the proposed DJSAH against several state-of-the-art hashing methods.
               
Click one of the above tabs to view related content.