LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Crisscross-Global Vision Transformers Model for Very High Resolution Aerial Image Semantic Segmentation

Photo by martindorsch from unsplash

Semantic segmentation is a key means for understanding very high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation… Click to show full abstract

Semantic segmentation is a key means for understanding very high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation of VHR images, with convolutional neural networks (CNNs) as the basic framework; however, owing to the highly complex details present in VHR images and the high spatial dependence of geographical objects, CNN-based methods are inadequate. This is because the inherent locality of CNNs limits the size of the receptive field, thus limiting the ability to obtain long-range context information. To solve this problem, in this article, we propose a transformer-based novel deep learning model called crisscross-global vision transformers (CGVTs). CGVT exploits the transformer’s inherent ability to obtain long-range context information to solve the restricted receptive field problem. Specifically, we redesign the self-attention (SA) mechanism in the transformer and call it crisscross-global attention. It consists of two parts: a crisscross transformer encoder block (CC-TEB) and a global squeeze transformer encoder block (GS-TEB). CC-TEB overcomes the limitation of the traditional SA design (specifically, difficulty applying it to VHR aerial image segmentation) and further increases the local feature representation ability of the model. GS-TEB increases the global feature representation ability of the model. The results of experiments conducted on the popular ISPRS Vaihingen, IEEE GRSS data fusion contest Zeebrugge, and LoveDA semantic segmentation challenge datasets verify the effectiveness and superiority of our proposed method. Specifically, it achieved state-of-the-art performance on both Zeebrugge and LoveDA datasets and is currently ranked second in the Vaihingen dataset.

Keywords: semantic segmentation; high resolution; crisscross global; segmentation; global vision

Journal Title: IEEE Transactions on Geoscience and Remote Sensing
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.