RGB and thermal infrared (RGBT) tracking as a solution in complex environments has gradually become a research hotspot. The powerful complementarity between RGB and thermal infrared data enables trackers to… Click to show full abstract
RGB and thermal infrared (RGBT) tracking as a solution in complex environments has gradually become a research hotspot. The powerful complementarity between RGB and thermal infrared data enables trackers to work 24/7. Existing works usually adopt the symmetric network structure that deploys the identical strategy to mine modalities with different properties, ignoring the heterogeneity among modalities. In this article, we propose a novel asymmetric global–local mutual integration network via comprehensively considering symmetric structure, heterogeneity-based global association, and interframe communication. It consists of asymmetric mode-distinguishing parallel structure (AMPS), cross-modal global–local interaction, and interframe monitoring strategy (IMS). Specifically, the AMPS performs discriminative mining on the information of the two modalities by combining the discount module and the branch cement module, and extracts multiscale cues through the multiscale auxiliary module to handle the challenges of scale variation and small-size objects. Then, the global mining module is deployed in the cross-modal global–local interaction section to jointly perform intramodal and intermodal global correlation while acting as the global complement to local feature extraction. Finally, the IMS employs a fast optical flow algorithm to detect interframe displacement to assist the network in better handling camera and fast object motion. Extensive experiments on GTOT, RGBT234, and LasHeR datasets adequately verify the effectiveness of the proposed network, and further ablation experiments also confirm the efficacy of the asymmetric structure and components.
               
Click one of the above tabs to view related content.