Abstract Despite that efforts have shifted to learning local descriptors with convolutional neural network (CNN) from hand-crafted realm, the inherent feature hierarchy within CNN has been rarely explored. To increase… Click to show full abstract
Abstract Despite that efforts have shifted to learning local descriptors with convolutional neural network (CNN) from hand-crafted realm, the inherent feature hierarchy within CNN has been rarely explored. To increase both the invariant and discriminative abilities of the CNN-based local descriptors by making use of the complementary representation powers of the feature maps at different levels of CNN, in this paper, we design a multi-level feature aggregation (MLFA) module to communicate information across pyramid levels effectively. Then, each level extracts a feature vector after feature fusion and the final descriptor concatenates these outputs. Moreover, to leverage the spatial structure within a local patch, we propose a novel spatial context pyramid (SCP) module to capture the spatial information. SCP is devised in a residual manner and only several additional parameters are introduced to the model. We implement our algorithm based on the HardNet framework and carry out comprehensive evaluation on the UBC Phototour, HPatches and ETH datasets. The experimental results demonstrate that the proposed method performs favorably against the state-of-the-art ones. Ablation study is also provided to show the effectiveness of each component.
               
Click one of the above tabs to view related content.