Target detection in aerial images taken by unmanned aerial vehicles is the most widely used scene at present. Compared with ordinary images, the background of aerial images is more complex,… Click to show full abstract
Target detection in aerial images taken by unmanned aerial vehicles is the most widely used scene at present. Compared with ordinary images, the background of aerial images is more complex, and the target size is smaller, which results in inferior detection precision and a high false detection rate. This paper proposes a new small target detection model TCA-YOLOv5m, which is based on YOLOv5m and combines the Transformer algorithm and the Coordinate Attention (CA) mechanism. In this model, the transformer algorithm is added to the end of the backbone of the YOLOv5, which enables the model to mine more features information of images. In the neck layer of the TCA-YOLOv5m, the Path Aggregation Network (PANet) and transformer algorithm are combined to enhance the expression capacity for the feature pyramid and improve the detection precision of occluded high-density small targets, and CA is introduced to more accurately locate targets in high-density scenes. In addition, the TCA-YOLOv5m adds a detection layer to improve the ability to capture small targets. This paper uses VisDrone 2019 as experimental data, and takes experiments to compare the detection precision and detection speed of the proposed model with baseline models. The experiment results indicate that the detection precision of the TCA-YOLOv5m reaches 97.4%, which is 5.2% higher than that of YOLOv5; the value of MAP @ 50 reaches 58.5%, which is 14.8% higher than YOLOv5. The Frames Per Second (FPS) of the TCA-YOLOv5m is 12.96 f/s, which ensures a certain real-time performance. Therefore, the TCA-YOLOv5m is suitable for the task of detecting dense small targets in aerial images.
               
Click one of the above tabs to view related content.