Numerous deep-learning methods have been successfully applied to semantic segmentation (SS) and height estimation (EH) of remote-sensing imagery. It has also been proved that such a framework can be reusable… Click to show full abstract
Numerous deep-learning methods have been successfully applied to semantic segmentation (SS) and height estimation (EH) of remote-sensing imagery. It has also been proved that such a framework can be reusable for multiple tasks to reduce computational resource overhead. However, there are still some technical limitations due to the semantic inconsistency between 3-D and 2-D features and the strong interference of different objects with similar spectral–spatial properties. Previous works have sought to address these issues through hard (HPS) or soft parameter sharing (SPS) schemes. But due to unintentional integration, the specific information transmitted between multiple tasks is not clear or in a lot of redundancy. Furthermore, tuning the weights by hand between classification and regression loss functions is challenging. In this article, a novel multitask learning (MTL) method, termed associatively segmenting semantics and estimating height (ASSEH), is proposed to associatively segment semantics and estimate height from monocular remote-sensing imagery. First, considering semantic inconsistency across tasks, we design a task-specific distillation (TSD) module containing a set of task-specific gating units (TSGUs) for each task at the cost of fewer parameters. The module allows for task-specific features to be tailored from the backbone while allowing for task-shared features to be transmitted. Second, we leverage the proposed cross-task propagation (CTP) module to construct and diffuse the local pattern graphlets at the common positions across tasks. Such a high-order recursive method can bridge two tasks explicitly to effectively settle semantic ambiguities caused by similar spectral characteristics with less computational burden and memory requirements. Third, a dynamic weighted geometric mean (DWGeoMean) strategy is introduced to dynamically learn the weights of each task and be more robust to the magnitude of the loss function. Finally, the results of the ISPRS Vaihingen dataset and the urban semantic 3-D (US3D) dataset well demonstrate that our ASSEH achieves state-of-the-art performance.
               
Click one of the above tabs to view related content.