Contrastive self-supervised learning (CSSL) has achieved promising results in extracting visual features from unlabeled data. Most of the current CSSL methods are used to learn global image features with low-resolution… Click to show full abstract
Contrastive self-supervised learning (CSSL) has achieved promising results in extracting visual features from unlabeled data. Most of the current CSSL methods are used to learn global image features with low-resolution that are not suitable or efficient for pixel-level tasks. In this paper, we propose a coarse-to-fine CSSL framework based on a novel contrasting strategy to address this problem. It consists of two stages, one for encoder pre-training to learn global features and the other for decoder pre-training to derive local features. Firstly, the novel contrasting strategy takes advantage of the spatial structure and semantic meaning of different regions and provides more cues to learn than that relying only on data augmentation. Specifically, a positive pair is built from two nearby patches sampled along the direction of the texture if they fall into the same cluster. A negative pair is generated from different clusters. When the novel contrasting strategy is applied to the coarse-to-fine CSSL framework, global and local features are learned successively by forcing the positive pair close to each other and the negative pair apart in an embedding space. Secondly, a discriminant constraint is incorporated into the per-pixel classification model to maximize the inter-class distance. It makes the classification model more competent at distinguishing between different categories that have similar appearance. Finally, the proposed method is validated on four SAR images for land-cover classification with limited labeled data and substantially improves the experimental results. The effectiveness of the proposed method is demonstrated in pixel-level tasks after comparison with the state-of-the-art methods.
               
Click one of the above tabs to view related content.