Semantic segmentation has been a crucial technology for various practical applications such as autonomous driving. Recently, attempts have been made to improve the performance of semantic segmentation using depth information.… Click to show full abstract
Semantic segmentation has been a crucial technology for various practical applications such as autonomous driving. Recently, attempts have been made to improve the performance of semantic segmentation using depth information. However, most attempts have been focused on the indoor environment for the following reasons. First, it is relatively more difficult to obtain accurate and dense depth information outdoors. Second, a network with a new structure is required to use depth information because processing depth as an input demands an additional encoder. To overcome aforementioned difficulties, we propose a novel Depth and Pixel-distance based Attention (DPA) module, which utilizes depth information to compute the similarity between pixels. The similarity of pixels is computed using the fact that pixels belonging to the same object have similar depth values. Because only the relative difference in depth is considered, it is relatively robust despite the accuracy of the provided depth information. Furthermore, DPA is a simple plug-in module that can be applied to existing RGB-based segmentation backbones. Since no encoder is added, it is much more efficient in terms of computation. We conduct extensive experiments on the Cityscapes dataset using various baseline architectures. Regardless of the baseline models, DPA yields meaningful performance improvements in semantic segmentation tasks. It is also computationally more efficient compared to the methods that take depth information as input.
               
Click one of the above tabs to view related content.