The development of single-modality target tracking based on visible light has been limited in recent years because visual light images are highly susceptible to environmental and lighting influences. Thermal infrared… Click to show full abstract
The development of single-modality target tracking based on visible light has been limited in recent years because visual light images are highly susceptible to environmental and lighting influences. Thermal infrared images can well compensate for this defect, so RGBT tracking has attracted increasing attention. However, existing studies are limited to the aggregation of multimodal information using feature fusion, ignoring the role of decision-level fusion in tracking, and the original re-detection algorithm in the used baseline model is prone to the accumulation of failures. To deal with these problems, we propose the Redetection Multimodal Fusion Network (RMFNet). The network is divided into three branches, the visible light branch, the thermal infrared branch, and the fusion branch. The three-branch structure can plainly utilize the complementary advantages of multimodal information and the commonalities and specific characteristics of the two modalities. We propose a multimodal feature fusion module (EFM), which can adaptively calculate the reliability of the modality and perform a weighted fusion of the two-modality features features. The existing redetection algorithm is improved, and the re-detection mechanism of global search in the current frame is added to reduce the accumulation of failures. We have conducted extensive comparative validation on two widely used benchmark datasets, GTOT and RGBT234. The outcomes of the experiments suggest that RMFNet outperforms other tracking methods.
               
Click one of the above tabs to view related content.