Convolutional neural networks (CNNs) with excellent spatial feature extraction abilities have become popular in remote sensing (RS) image change detection (CD). However, CNNs often focus on the extraction of spatial… Click to show full abstract
Convolutional neural networks (CNNs) with excellent spatial feature extraction abilities have become popular in remote sensing (RS) image change detection (CD). However, CNNs often focus on the extraction of spatial information but ignore important spectral and temporal sequences for hyperspectral images (HSIs). In this article, we propose a joint spectral, spatial, and temporal transformer for hyperspectral image change detection (HSI-CD), named SST-Former. First, the SST-Former position-encodes each pixel on the cube to remember the spectral and spatial sequences. Second, a spectral transformer encoder structure is used to extract spectral sequence information. Then, a class token for storing the class information of a single temporal HSI concatenates the output of the spectral transformer encoder. The spatial transformer encoder is used to extract spatial texture information in the next step. Finally, the features of different temporal HSIs are sent as the input of temporal transformer, which is used to extract useful CD features between the current HSI pairs and obtain the binary CD result through multilayer perceptron (MLP). We evaluate the SST-Former on three HSI-CD datasets by numerous experiments, showing that it performs better than other excellent methods both visually and qualitatively. The codes of this work will be available at https://github.com/yanhengwang-heu/IEEE_TGRS_SSTFormer.
               
Click one of the above tabs to view related content.