The tracking-by-detection framework receives growing attention through the integration with the convolutional neural networks (CNNs). Existing tracking-by-detection-based methods, however, fail to track objects with severe appearance variations. This is because… Click to show full abstract
The tracking-by-detection framework receives growing attention through the integration with the convolutional neural networks (CNNs). Existing tracking-by-detection-based methods, however, fail to track objects with severe appearance variations. This is because the traditional convolutional operation is performed on fixed grids, and thus may not be able to find the correct response while the object is changing pose or under varying environmental conditions. In this paper, we propose a deformable convolution layer to enrich the target appearance representations in the tracking-by-detection framework. We aim to capture the target appearance variations via deformable convolution, which adaptively enhances its original features. In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance. The enriched feature representation through deformable convolution facilitates the discrimination of the CNN classifier on the target object and background. The extensive experiments on the standard benchmarks show that the proposed tracker performs favorably against the state-of-the-art methods.
               
Click one of the above tabs to view related content.