Abstract In existing state-of-the-art object detectors, feature pyramid networks (FPN) and multiscale feature fusion are still typically used. The traditional FPN fusion strategy is based on the top-down fusion of… Click to show full abstract
Abstract In existing state-of-the-art object detectors, feature pyramid networks (FPN) and multiscale feature fusion are still typically used. The traditional FPN fusion strategy is based on the top-down fusion of high-level semantic information. The top-down fusion method generally uses upsampling based on interpolation, which often results in jagged edges, mosaic distortion, and edge blurring. Moreover, in order to improve accuracy, the FPN-based fusion strategy must add multiple top-down components for fusion, which increases computational costs and leads to a poor balance between precision and speed. In this paper, we propose a novel fusion strategy based on a backbone network. We aim to design simple and efficient components for high-quality object detection. Our proposed strategy, bi-directional skip connection FPN (BiSCFPN), consists of three components: a bi-directional skip connection (BiSC), a selective dilated convolution module (SDCM), and sub-pixel convolution (SP). The BiSC aims to enhance semantic information between different feature layers in the backbone network and simultaneously uses the SDCM to improve the receptive fields of differently sized targets in the fusion stage. Finally, SP learns the relationship between the features of upsampling and downsampling images to effectively mitigate the problems caused by the traditional interpolation method. BiSCFPN achieves an average precision of 38.2% in tests with the Microsoft Common Objects in Context (MS COCO) test-dev dataset at a real-time speed of ~ 50 FPS ( 608 × 608 ) using an Nvidia GeForce RTX 2080 Ti graphics card and significantly improves the balance between precision and speed.
               
Click one of the above tabs to view related content.