Hashing techniques have attracted considerable attention owing to their advantages of efficient computation and economical storage. However, it is still a challenging problem to generate more compact binary codes for… Click to show full abstract
Hashing techniques have attracted considerable attention owing to their advantages of efficient computation and economical storage. However, it is still a challenging problem to generate more compact binary codes for promising performance. In this paper, we propose a novel contrastive vision transformer hashing method, which seamlessly integrates contrastive learning and vision transformers (ViTs) with hash technology into a well‐designed model to learn informative features and compact binary codes simultaneously. First, we modify the basic contrastive learning framework by designing several hash layers to meet the specific requirement of hash learning. In our hash network, ViTs are applied as backbones for feature learning, which is rarely performed in existing hash learning methods. Then, we design a multiobjective loss function, in which contrastive loss explores discriminative features by maximizing agreement between different augmented views from the same image, similarity preservation loss performs pairwise semantic preservation to enhance the representative capabilities of hash codes, and quantization loss controls the quantitative error. Hence, we can facilitate end‐to‐end joint training to improve the retrieval performance. The encouraging experimental results on three widely used benchmark databases demonstrate the superiority of our algorithm compared with several state‐of‐the‐art hashing algorithms.
               
Click one of the above tabs to view related content.