Recently, methods of scene classification that are based on deep learning have become increasingly mature in remote sensing. However, training an excellent deep learning model for remote sensing scene classification… Click to show full abstract
Recently, methods of scene classification that are based on deep learning have become increasingly mature in remote sensing. However, training an excellent deep learning model for remote sensing scene classification requires a large number of labeled samples. Therefore, scene classification with insufficient scene images becomes a challenge. The deepEMD network is currently the most popular model for solving these tasks. Although the deepEMD network obtains impressive results on common few-shot baseline datasets, it is insufficient for capturing discriminative feature information about the scene from global and local perspectives. For this reason, an efficient few-shot scene classification scheme in remote sensing is proposed by combining multiple attention mechanisms and the attention-reference mechanism into the deepEMD network in this paper. First, scene features can be extracted by the backbone that incorporates global attention module and local attention module, which enables the backbone to capture discriminative information from both the global level and the local level. Second, the attention-reference mechanism generates the weights of elements in the earth mover’s distance (EMD) formulation, which can effectively alleviate the effects of complex background and intra-class morphological differences. The experimental results on three popular remote sensing benchmark datasets, Aerial Image Dataset (AID), OPTIMAL-31, and UC Merced, illustrate that our proposed scheme obtains state-of-the-art results in few-shot remote sensing scene classification.
               
Click one of the above tabs to view related content.