In recent years, RGBT tracking has become a hot topic in the field of visual tracking, and made great progress. In this paper, we propose a novel Trident Fusion Network… Click to show full abstract
In recent years, RGBT tracking has become a hot topic in the field of visual tracking, and made great progress. In this paper, we propose a novel Trident Fusion Network (TFNet) to achieve effective fusion of different modalities for robust RGBT tracking. In specific, to deploy the complementarity of features of all convolutional layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in two modalities. Moreover, we design a trident architecture to integrate the fused features and both modality-specific features for robust target representations. There are three main advantages. First, retaining the classification layer of each modality is beneficial to enhance feature learning of single modality, and compared with aggregate branches, single-modality branches pay more attention to the mining of modal specific information. Second, when some modality is noisy or invalid, the modality-specific branches would capture more discriminative features for RGBT tracking. Finally, the integration of aggregation branches and single-modality branches is beneficial to the complementary learning of different modalities. In addition, we also introduce a feature pruning module in each branch to prune the redundant features and avoid network overfitting. Experimental results on four RGBT tracking benchmark datasets suggest that our tracker achieves superior performance against the state-of-the-art RGBT tracking methods.
               
Click one of the above tabs to view related content.