The deep learning based methods have improved the visual tracking precision significantly. However, the background distraction and the high precise localization remain challenging problems. Despite that some methods have fused… Click to show full abstract
The deep learning based methods have improved the visual tracking precision significantly. However, the background distraction and the high precise localization remain challenging problems. Despite that some methods have fused the deep and shallow layer features to solve these problems, the existing fusion methods, like simply concatenating or adding the features from the different layers, cannot take the advantage of both the deep and shallow layer features fully. In this paper, we propose a new adaptive feature fusion method, called the instance-based feature pyramid (IBFP) to obtain the discriminative high-resolution feature, which not only inherits the discriminative information from the deep layer feature, but also keeps the high precision localization information of the shallow layer feature. For utilizing the deep and shallow features effectively, we design an instance-based upsampling (IBU) module to fuse them, and a compressed space channel selection (CSCS) module to re-weight the feature channels adaptively. We insert the IBU and CSCS modules in the Siamese tracker for end-to-end training and testing. By using the proposed IBU and CSCS modules, we fuse the deep and shallow features in a series manner. Experiments on large-scale benchmark datasets demonstrate that the proposed modules boost the capabilities of distinguishing the targets and the similar distractors and perform favorably against the state-of-the-art.
               
Click one of the above tabs to view related content.