With the growing use of minimally invasive surgical procedures, endoscopic video archives are growing at a rapid pace. Efficient access to relevant content in such huge multimedia archives require compact… Click to show full abstract
With the growing use of minimally invasive surgical procedures, endoscopic video archives are growing at a rapid pace. Efficient access to relevant content in such huge multimedia archives require compact and discriminative visual features for indexing and matching. In this paper, we present an effective method to represent images using salient convolutional features. Convolutional kernels from the first layer of a pre-trained convolutional neural network (CNN) are analyzed and clustered into multiple distinct groups, based on their sensitivity to colors and textures. Dominant features detected by each cluster are collected into a single, layout-preserving feature map using a spatial maximal activator pooling (SMAP) approach. A moving window based structured pooling method then captures spatial layout features and global shape information from the aggregated feature map to populate feature histograms. Finally, individual histograms for each cluster are combined into a single comprehensive feature histogram. Clustering convolutional feature space allow extraction of color and texture features of varying strengths. Further, the SMAP approach enable us to select dominant discriminative features. The proposed features are compact and capable of conveniently outperforming several existing features extraction approaches in retrieval and classification tasks on endoscopy images dataset.
               
Click one of the above tabs to view related content.