LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Novel up-scale feature aggregation for object detection in aerial images

Photo from wikipedia

Abstract Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason,… Click to show full abstract

Abstract Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason, most general object detectors suffer from two critical challenges while dealing with aerial images: 1) The widely exploited Feature Pyramid Network works by integrating high-level features to lower levels progressively. However, this manner does not transfer equivalent information from each level of backbone network to the generated features, and the shared detection head faces an unbalanced sources of information flow, damaging the detection accuracy. 2) Up-sampling is commonly used to expand feature resolution for feature fusion or feature aggregation. However, existing up-sampling methods are ineffective to reconstruct high resolution feature maps. To address these two challenges, two works are proposed: 1) An up-scale feature aggregation framework that fully utilizes multi-scale complementary information, and 2) a novel up-sampling method that further improve detection accuracy. These two proposals are integrated into an end-to-end single-stage object detector namely HawkNet. Extensive experiments are conducted on VisDrone-DET2018, UAVDT and DIOR datasets. Compared to the RetinaNet baseline, our HawkNet achieves absolute gains of 6.0%, 1.2% and 5.9% in average precision (AP) on VisDrone-DET2018, UAVDT and DIOR datasets, respectively. For a 800  ×  1333 input on the UAVDT dataset, HawkNet with ResNet-50 backbone surpasses existing methods for single-scale inference and achieves the best performance (37.4 AP), while operating at 10.6 frames per second on a single Nvidia GTX 1080Ti GPU.

Keywords: object detection; detection; feature; aerial images; feature aggregation

Journal Title: Neurocomputing
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.