Different approaches were proposed to design deep CNNs for semantic segmentation. Usually, they are built upon an encoder–decoder architecture and require computationally expensive operations on high-resolution activation maps. Since for… Click to show full abstract
Different approaches were proposed to design deep CNNs for semantic segmentation. Usually, they are built upon an encoder–decoder architecture and require computationally expensive operations on high-resolution activation maps. Since for real-time segmentation the costs are critical, efficient approaches compromise spatial information to achieve real-time segmentation but with a considerable drop in accuracy. We introduce a new module based on depthwise separable, shuffled and grouped convolutions that optimize up-sampling operations by using a sizeable receptive field and preserving spatial information. Then, we designed an efficient network based on dense connectivity to achieve a remarkable trade-off accuracy and speed. We show through set of experiments that even by up-sampling with a lightweight decoder, our applied architecture scores on Cityscape 69.5% Mean IoU with 1024×512\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1024\times 512$$\end{document} inputs and 95.2 FPS on the test set.
               
Click one of the above tabs to view related content.