Learning-based multimodal data has attracted increasing interest in the remote sensing community owing to its robust performance. Although it is preferable to collect multiple modalities for training, not all of… Click to show full abstract
Learning-based multimodal data has attracted increasing interest in the remote sensing community owing to its robust performance. Although it is preferable to collect multiple modalities for training, not all of them are available in practical scenarios due to the restriction of imaging conditions. Therefore, how to assist the model inference with missing modalities is significant for multimodal remote sensing image processing. In this work, we propose a general framework called modality-shared hallucination network (MSH-Net) to address this issue by reconstructing complete modality-shared features from incomplete inference modalities. Compared to conventional privilege modality hallucination methods, MSH-Net does not only help preserve the cross-modal interactions for model inference but also scales well with the increasing number of missing modalities. We further develop a novel joint adaptation distillation (JAD) method that guides the hallucination model to learn the modality-shared knowledge from the multimodal model by matching the joint probability distributions between representation and groundtruth. This overcomes the representation heterogeneity caused by the discrepancy between inputs and structures of multimodal and hallucination model while preserving the decision boundaries refined by multimodal cues. Finally, extensive experiments conducted on four common modality combinations demonstrate that the proposed MSH-Net can effectively address the problem of missing modalities and achieve state-of-the-art performance. Code is available at: https://github.com/shicaiwei123/MSHNet
               
Click one of the above tabs to view related content.