Understanding the visual quality of a feature map plays a significant role in many active vision applications. Previous works mostly rely on object-level features, such as compactness, to estimate the… Click to show full abstract
Understanding the visual quality of a feature map plays a significant role in many active vision applications. Previous works mostly rely on object-level features, such as compactness, to estimate the quality score of a feature map. However, the compactness is leveraged on feature maps produced by salient object detection techniques where the maps tend to be compact. As a result, the compactness feature fails when the feature maps are blurry (e.g., fixation maps). In this paper, we regard the process of estimating the quality score of feature maps, specifically fixation maps, as a regression problem. After extracting several local, global, geometric, and positional characteristic features from a feature map, a model is learned using a random forest regressor to estimate the quality score of any unseen feature map. Our model is specifically tailored to estimate the quality of three types of maps: bottom-up, target, and contextual feature maps. These maps are produced for a large benchmark fixation data set of more than 900 challenging outdoor images. We demonstrate that our approach provides an accurate estimate of the quality of the abovementioned feature maps compared to the groundtruth data. In addition, we show that our proposed approach is useful in feature map integration for predicting human fixation. Instead of naively integrating all three feature maps when predicting human fixation, our proposed approach dynamically selects the best feature map with the highest estimated quality score on an individual image basis, thereby improving the fixation prediction accuracy.
               
Click one of the above tabs to view related content.