Object detection is a critical component of autonomous driving perception. To achieve comprehensive environmental perception, mainstream methods commonly rely on multimodal sensor fusion. However, existing solutions often face challenges such… Click to show full abstract
Object detection is a critical component of autonomous driving perception. To achieve comprehensive environmental perception, mainstream methods commonly rely on multimodal sensor fusion. However, existing solutions often face challenges such as low sensor utilization and suboptimal fusion strategies. To address these issues, this article proposes MSAFusion, a multisensor adaptive fusion framework based on a bird’s eye view (BEV). In our framework, we extract multiview features using Vision Mamba (Vim), generate BEV queries through positional encoding for preliminary fusion with multimodal features, and employ a deep Q-network (DQN) for adaptive fusion based on feature consistency and continuity. This approach enables efficient utilization of multimodal sensors and optimal fusion across diverse environments. Extensive experiments on the nuScenes and Radiate datasets demonstrate that MSAFusion achieves state-of-the-art performance, delivering superior panoramic environmental perception, improved object detection accuracy, and enhanced flexibility compared to existing multisensor fusion methods.
               
Click one of the above tabs to view related content.