Salient object segmentation is an important computer vision problem having applications in numerous areas such as video surveillance, scene parsing, autonomous navigation etc. For images, this task is quite challenging… Click to show full abstract
Salient object segmentation is an important computer vision problem having applications in numerous areas such as video surveillance, scene parsing, autonomous navigation etc. For images, this task is quite challenging due to clutter/texture present in the background, low resolution and/or low contrast of the object(s) of interest etc. In case of videos, additional issues such as object deformation, camera motion and presence of multiple moving objects make the foreground object segmentation a significantly difficult and open problem. However, motion pattern can also act as an important cue to identify the foreground objects against the background. This is exploited by the recent approaches via aggregation of temporally perturbed information from a series of consecutive frames. Unfortunately for images, this additional cue is not available. In this paper, we propose to emulate the effect of such perturbations by constructing a bag of multiple augmentations applied on a single input image. Saliency features are estimated independently from each perturbed image in this bag, which are further combined using a novel aggregation strategy based on a convolutional gated recurrent encoder-decoder unit. Through extensive experiments on the benchmark datasets, we show better or very competitive performance when compared with the state-of-the-art methods. We further observe that even with a bag constructed using simple affine transformations, we achieve impressive performances, proving the robustness of the proposed framework.
               
Click one of the above tabs to view related content.