Hyperspectral image (HSI) classification is a critical task with numerous applications in the field of remote sensing. Although convolutional neural networks have achieved remarkable success in computer vision, they are… Click to show full abstract
Hyperspectral image (HSI) classification is a critical task with numerous applications in the field of remote sensing. Although convolutional neural networks have achieved remarkable success in computer vision, they are still limited in the ability to model long-term dependencies due to small receptive fields. Recently, vision transformers have been used in HSI classification, where multi-head self-attention (MHSA), as the key feature extractor of transformers, learns global dependencies in long-range positions and bands of HSI pixels. Existing vision transformers for classifying HSIs with a large number of bands, however, have some limitations in that features extracted by MHSA may exhibit over-dispersion. In this article, we propose a Group-Aware Hierarchical Transformer (GAHT) for HSI classification, which confines MHSA to the local spatial–spectral context by introducing a new grouped pixel embedding (GPE) module. The GPE emphasizes local relationships within HSI spectral channels, resulting in a global–local fashion from a spatial–spectral context for HSI classification. In addition, we construct our transformer in a hierarchical manner, which can significantly improve classification accuracy with only a few parameters. Extensive experiments on four benchmark HSI datasets demonstrate that the proposed method outperforms state-of-the-art HSI classification algorithms. The source code is available at https://github.com/MeiShaohui/Group-Aware-Hierarchical-Transformer.
               
Click one of the above tabs to view related content.