LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Meta‐feature based few‐shot Siamese learning for Urdu optical character recognition

Photo from wikipedia

Standard convolution neural network (CNN) achieves high level of accuracy for the recognition of characters in different languages. However, like other deep neural networks, training of CNN requires a substantial… Click to show full abstract

Standard convolution neural network (CNN) achieves high level of accuracy for the recognition of characters in different languages. However, like other deep neural networks, training of CNN requires a substantial amount of data. Lack of sufficient training data invokes dataset bias, during learning process, which leads to a decay in the performance of CNN. The limitation of training data can be addressed by using few‐shot learners. In this research, CNN‐based few‐shot Siamese learner is trained on meta‐features, extracted from Urdu text images using a novel graph‐based normal to tangent line (GNTL) technique, for Urdu optical character recognition (OCR) across different font sizes. The learner is trained on three corpora (datasets) including one benchmark corpus “Centre for Language Engineering Text Images” and two other corpora, that is, “Urdu Thickness Graphs” (UTG) and “Urdu OCR Font 16 to 36” (UOF) which are developed and released in this research. 80% of data is used for training while 20% of data is used for testing. To create UTG corpus, the proposed novel feature extraction technique GNTL is used and a meta‐features‐based corpus is developed in form of thickness graphs. The third corpus UOF is based on five different font sizes, that is, 16, 20, 26, 30, and 36. The performance of few‐shot Siamese learner is compared with a standard CNN, trained on the same three corpora. Meta‐feature based few‐shot Siamese learner achieves a promising recognition accuracy and outperforms standard CNN by around 3%. On average, the performance of few‐shot Siamese learner is 96.82% while standard CNN reveals an average performance of 93.96%.

Keywords: recognition; siamese learner; cnn; feature; shot siamese; based shot

Journal Title: Computational Intelligence
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.