Semantic segmentation is a key means for understanding very high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation… Click to show full abstract
Semantic segmentation is a key means for understanding very high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation of VHR images, with convolutional neural networks (CNNs) as the basic framework; however, owing to the highly complex details present in VHR images and the high spatial dependence of geographical objects, CNN-based methods are inadequate. This is because the inherent locality of CNNs limits the size of the receptive field, thus limiting the ability to obtain long-range context information. To solve this problem, in this article, we propose a transformer-based novel deep learning model called crisscross-global vision transformers (CGVTs). CGVT exploits the transformer’s inherent ability to obtain long-range context information to solve the restricted receptive field problem. Specifically, we redesign the self-attention (SA) mechanism in the transformer and call it crisscross-global attention. It consists of two parts: a crisscross transformer encoder block (CC-TEB) and a global squeeze transformer encoder block (GS-TEB). CC-TEB overcomes the limitation of the traditional SA design (specifically, difficulty applying it to VHR aerial image segmentation) and further increases the local feature representation ability of the model. GS-TEB increases the global feature representation ability of the model. The results of experiments conducted on the popular ISPRS Vaihingen, IEEE GRSS data fusion contest Zeebrugge, and LoveDA semantic segmentation challenge datasets verify the effectiveness and superiority of our proposed method. Specifically, it achieved state-of-the-art performance on both Zeebrugge and LoveDA datasets and is currently ranked second in the Vaihingen dataset.
               
Click one of the above tabs to view related content.