Detecting objects in aerial images is a challenging task due to the large-scale variations and arbitrary orientations with tiny instances. A new multi-scale transformer-based aerial objects detector called MStrans is… Click to show full abstract
Detecting objects in aerial images is a challenging task due to the large-scale variations and arbitrary orientations with tiny instances. A new multi-scale transformer-based aerial objects detector called MStrans is proposed in this paper to deal with the challenges in aerial detection. To detect remote instances, MStrans adopts a multi-scale patch embedding transformer (MViT) to extract the global features of the image effectively. Furthermore, to capture the different discriminant features for classification and regression branch tasks, the partial interactive fusion module (PIFM) is designed to enhance the semantic expression of the key features of classification and regression tasks by using the strategy of interactive modeling of adjacent layer features. In addition, considering that the transformer may worsen the local feature details while capturing long-distance feature dependencies, this paper designs a global to local interactive fusion module (GLIFM). It uses the advantage of convolution to extract local features to enrich the detailed information in the transformer. Experiments were carried out on DOTA and DIOR datasets, and the MStrans achieves superior detection performances compared with other approaches.
               
Click one of the above tabs to view related content.