Although deep learning-based approaches have made significant progress in remote sensing (RS) image classification, the supervised learning paradigm has shortcomings under a limited number of labeled samples, which restricts the… Click to show full abstract
Although deep learning-based approaches have made significant progress in remote sensing (RS) image classification, the supervised learning paradigm has shortcomings under a limited number of labeled samples, which restricts the classification performance to a great extent. In this article, we investigate an effective self-supervised feature representation (SSFR) architecture for multimodal RS images’ few-shot land cover classification. Specifically, we exploit a multiview learning strategy to construct multiple views from multimodal RS images. This method builds several complementary views of the same observed scenes from hyperspectral images or different modalities of RS data. Then, we build the deep feature extractor to learn high-level feature representations from each view via contrastive learning. Contrastive learning aggregates the samples of the same scene while separating samples of different scenes in the latent space, and this process does not require any labeled information. What is more, to learn more robust features from different views, we utilize multitask learning strategy to train the feature extraction network. Finally, a lightweight machine learning method is employed to classify the learned features using a few annotated samples. To further demonstrate the self-supervised feature learning capability of the proposed model, we train the feature representation network in multiple source datasets. Comprehensive feature learning and classification experiments have certified the effectiveness and superiority of the proposed method.
               
Click one of the above tabs to view related content.