Semantic segmentation plays a vital role in autonomous vehicles. Fusing the rich details of RGB image and the illumination robustness of thermal image has great potential to improve the performance… Click to show full abstract
Semantic segmentation plays a vital role in autonomous vehicles. Fusing the rich details of RGB image and the illumination robustness of thermal image has great potential to improve the performance of RGB-T semantic segmentation. In multispectral feature fusion, the current main methods are less effective in the characterization of correlations and complementarities of RGB-T. In order to generate robust cross-spectral fusion features, we propose a multispectral fusion transformer network (MFTNet). Specifically, we first design an MFT module to handle the intraspectra correlation and the interspectra complementarity of RGB-T in the multispectral fusion encoder. MFT effectively enhances the RGB-T feature representation under various challenges. Then, an optimization strategy with progressive deep supervision (PDS) loss is proposed to directly supervise the upper and lower layers of the decoder. This strategy can guide the decoder to achieve precise segmentation in a coarse-to-fine manner. Finally, plenty of experimental results prove the effectiveness of our method. On the MFNet dataset, MFNet achieved 74.7 mAcc and 57.3 mIoU, outperforming the state-of-the-art methods.
               
Click one of the above tabs to view related content.