With the advent of the era of artificial intelligence, text detection is widely used in the real world. In text detection, due to the limitation of the receptive field of… Click to show full abstract
With the advent of the era of artificial intelligence, text detection is widely used in the real world. In text detection, due to the limitation of the receptive field of the neural network, most existing scene text detection methods cannot accurately detect small target text instances in any direction, and the detection rate of mutually adhering text instances is low, which is prone to false detection. To tackle such difficulties, in this paper, we propose a new feature pyramid network for scene text detection, Cross-Scale Attention Aggregation Feature Pyramid Network (CSAA-FPN). Specifically, we use a Attention Aggregation Feature Module (AAFM) to enhance features, which not only solves the problem of weak features and small receptive fields extracted by lightweight networks but also better handles multi-scale information and accurately separate adjacent text instances. An attention module CBAM is introduced to focus on effective information so that the output feature layer has richer and more accurate information. Furthermore, we design an Adaptive Fusion Module (AFM), which weights the output features and pays attention to the pixel information to further refine the features. Experiments conducted on CTW1500, Total-Text, ICDAR2015, and MSRA-TD500 have demonstrated the superiority of this model.
               
Click one of the above tabs to view related content.