In remotely sensed images, high intraclass variance and interclass similarity are ubiquitous due to complex scenes and objects with multivariate features, making semantic segmentation a challenging task. Deep convolutional neural… Click to show full abstract
In remotely sensed images, high intraclass variance and interclass similarity are ubiquitous due to complex scenes and objects with multivariate features, making semantic segmentation a challenging task. Deep convolutional neural networks can solve this problem by modeling the context of features and improving their discriminability. However, current learning paradigms model the feature affinity in spatial dimension and channel dimension separately and then fuse them in a sequential or parallel manner, leading to suboptimal performance. In this study, we first analyze this problem practically and summarize it as attention bias that reduces the capability of network in distinguishing weak and discretely distributed objects from wide-range objects with internal connectivity, when modeled only in spatial or channel domain. To jointly model both spatial and channel affinity, we design a synergistic attention module (SAM), which allows for channelwise affinity extraction while preserving spatial details. In addition, we propose a synergistic attention perception neural network (SAPNet) for the semantic segmentation of remote sensing images. The hierarchical-embedded synergistic attention perception module aggregates SAM-refined features and decoded features. As a result, SAPNet enriches inference clues with desired spatial and channel details. Experiments on three benchmark datasets show that SAPNet is competitive in accuracy and adaptability compared with state-of-the-art methods. The experiments also validate the hypothesis of attention bias and the efficiency of SAM.
               
Click one of the above tabs to view related content.