LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Keyword Detection Based on RetinaNet and Transfer Learning for Personal Information Protection in Document Images

Photo by bermixstudio from unsplash

In this paper, a keyword detection scheme is proposed based on deep convolutional neural networks for personal information protection in document images. The proposed scheme is composed of key character… Click to show full abstract

In this paper, a keyword detection scheme is proposed based on deep convolutional neural networks for personal information protection in document images. The proposed scheme is composed of key character detection and lexicon analysis. The first part is the key character detection developed based on RetinaNet and transfer learning. To find the key characters, RetinaNet, which is composed of convolutional layers featuring a pyramid network and two subnets, is exploited to detect key characters within the region of interest in a document image. After the key character detection, the second part is a lexicon analysis, which analyzes and combines several key characters to find the keywords. To train the model of RetinaNet, synthetic image generation and data augmentation are exploited to yield a large image dataset. To evaluate the proposed scheme, many document images are selected for testing, and two performance measurements, IoU (Intersection Over Union) and mAP (Mean Average Precision), are used in this paper. Experimental results show that the mAP rates of the proposed scheme are 85.1% and 85.84% for key character detection and keyword detection, respectively. Furthermore, the proposed scheme is superior to Tesseract OCR (Optical Character Recognition) software for detecting the key characters in document images. The experimental results demonstrate that the proposed method can effectively localize and recognize these keywords within noisy document images with Mandarin Chinese words.

Keywords: retinanet; detection; keyword detection; document images; character

Journal Title: Applied Sciences
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.