Abstract Deep model-based semantic segmentation has received ever increasing research focus in recent years. However, due to the complex model architectures, existing works are still unable to achieve high accuracy… Click to show full abstract
Abstract Deep model-based semantic segmentation has received ever increasing research focus in recent years. However, due to the complex model architectures, existing works are still unable to achieve high accuracy in real-time applications. In this paper, we propose a novel Sequential Prediction Network (termed SPNet) to seek a better trade-off between accuracy and efficiency. SPNet is also an end-to-end encoder-decoder architecture, which introduces a sequential prediction method to spread the contextual information from the low-level layers to the high-level layers. Besides, the proposed method is equipped with a stream Spatial Semantic and Edge Loss (termed 0,0.8,0.2SEL) and an adversarial network at multiple resolutions, which greatly improves the segmentation accuracy with a negligible increase in computation cost. To further utilize the extra unlabeled data, we present a knowledge distillation scheme to distill the structured knowledge from cumbersome to compact networks. Without using any pre-trained model, our method achieves state-of-the-art performance among exiting real-time segmentation models on several challenging datasets. Impressively, on the Cityscapes test dataset, it obtains 75.8 % mIoU at a speed of 61.2 FPS.
               
Click one of the above tabs to view related content.