Image-level weakly supervised semantic segmentation (WSSS) methods have greatly facilitated the extraction of buildings from remote-sensing (RS) images. However, the lack of the locations and extent of individual buildings in… Click to show full abstract
Image-level weakly supervised semantic segmentation (WSSS) methods have greatly facilitated the extraction of buildings from remote-sensing (RS) images. However, the lack of the locations and extent of individual buildings in image-level labels results in some limitations of the methods, especially in the cases of cluttered backgrounds and diverse building shapes and sizes. By utilizing bounding box annotations, a novel WSSS model is developed to improve building extraction from RS images in this article. Specifically, during the training phase, a multiscale feature retrieval (MFR) module is designed to learn multiscale building features and suppress the background noise inside the bounding box. In the inference phase, multiscale class activation maps (CAMs) are generated from multiscale features to achieve accurate building localization. Finally, a pseudo-mask generation and correction (PGC) module refines the CAMs to generate and correct the building pseudo-masks. Experiments are conducted to examine the proposed model in three datasets, namely the WHU aerial building dataset, the CrowdAI building dataset, and a self-annotated building dataset. Experimental results demonstrate that the proposed method outperforms baselines, achieving 76.99%, 75.51%, and 67.35% in terms of intersection over union (IoU) scores on the three challenging datasets, respectively. This article provides a methodological reference for the application of weakly supervised learning on RS images.
               
Click one of the above tabs to view related content.