Deep learning-based methods have been widely applied in remote sensing scene classification tasks. Recently, researchers focus more on clarifying the basis of a decision. For example, class activation mapping (CAM)… Click to show full abstract
Deep learning-based methods have been widely applied in remote sensing scene classification tasks. Recently, researchers focus more on clarifying the basis of a decision. For example, class activation mapping (CAM) can provide us the evidence by highlighting the related area in an image. However, the interpretability of remote sensing scene classification is more challenging than natural images, since remote sensing images usually contain more complicated objects. As a result, the CAM visual interpretation with traditional convolutional neural networks cannot accurately locate all target objects, which leads to some important objects are ignored. In this letter, we propose a novel model, named encoder-classifier-reconstruction CAM (ECR-CAM) neural network, to provide a more precise visual explanation. Specifically, ECR-CAM consists of four modules: an encoder module, a classifier module, a reconstruction module, and a CAM module. Encoder module is utilized to extract image features, and classifier module accounts for generating predictions. The reconstruction module is the key to locate more target objects. It employs the extracted features to reconstruct the input images, which is a pixel-level process. The reconstruction process allows the features to retain important information about all objects, which cannot be achieved by the classification task alone. Finally, the CAM module can show more target objects with more informative features. Experimental results show that our model not only improves the classification performance but also can locate the target objects more accurately.
               
Click one of the above tabs to view related content.