Semantic segmentation plays a critical role in scene understanding for self-driving vehicles. A line of efforts has proven that global context matters in urban scene segmentation due to massive scale… Click to show full abstract
Semantic segmentation plays a critical role in scene understanding for self-driving vehicles. A line of efforts has proven that global context matters in urban scene segmentation due to massive scale changes. However, we find that existing methods suffer from local ambiguities when dissipating continuous local context, i.e. scrambling to a huge receptive field of global cues by coarse pooling. To this end, this paper proposes a new Context Aggregation Module (CAM) that consists of two primary components: context encoding using no coarse pooling but encoder-decoders with appropriate sampling scales and gated fusion that extends gate attention mechanism to balance different-scale context during feature fusion. Weeding out coarse pooling and applying the encoder-decoder inherits the merits of exploring global context while avoiding the drawback of losing local contextual continuity. We then construct a Context Aggregation Network (CANet) and conduct extensive evaluations on challenging autonomous driving benchmarks of Cityscapes, CamVid and BDD100K. Consistently improved results evidence the effectiveness. Notably, we attain competitive mIoU 82.7% on Cityscapes and optimal mIoU 80.5% on CamVid.
               
Click one of the above tabs to view related content.