"2DSegFormer: 2-D Transformer Model for Semantic Segmentation on Aerial Images"

Two-dimensional position information of input tokens is essential for transformer-based semantic segmentation models, especially on high-resolution aerial images. However, recent transformer-based segmentation methods use position encoding to record position information and most position encoding methods encode the 1-D positions of tokens. Therefore, we propose a 2-D semantic transformer model (2DSegFormer) for semantic segmentation on aerial images. In 2DSegFormer, we design a novel 2-D positional attention to accurately record the 2-D position information required by the transformer. Furthermore, we design the dilated residual connection and use it instead of skip connection in the deep stages to get a larger receptive field. Skip connections are used in the shallow stages of 2DSegFormer to pass the details to the corresponding stages in the decoder. Experimental results on UAVid, Vaihingen, and AeroScapes datasets demonstrate the effectiveness of 2DSegFormer. Compared with the state-of-the-art methods, 2DSegFormer shows better performance and great robustness on three different datasets. In particular, 2DSegFormer-B2 achieves first place in the public ranking on the UAVid test set.

Keywords: semantic segmentation; segmentation aerial; segmentation; transformer model; position; aerial images

Journal Title: IEEE Transactions on Geoscience and Remote Sensing
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended