Abstract Multi-scene ancient Chinese text recognition (MACR) is a challenging task for ordinary people without relevant professional knowledge. Due to language barriers and the lack of related open datasets, there… Click to show full abstract
Abstract Multi-scene ancient Chinese text recognition (MACR) is a challenging task for ordinary people without relevant professional knowledge. Due to language barriers and the lack of related open datasets, there is little research on MACR. In this paper, a multi-scene ancient Chinese text (MACT) dataset, formed by handwritten text, calligraphy, natural scene text in ancient fonts, is established that includes synthetic samples generated for training and real scene samples collected for testing. We first perform experiments on some CNN structures as the baseline method, and the top-1 recognition result, 66.94%, is approximately 13.96% higher than subjective human recognition results. Furthermore, based on these models and confidence score from the baseline, a multi-model ensemble (MME) method is proposed, which adopts auxiliary datasets and a feature extraction method to augment data before training, utilizes different hyper-parameters to optimize in training, and integrates multiple models to improve recognition accuracy. The top-1 accuracy results of the MME method reach 73.36% and other top-n results also greatly surpass the baseline results. The MACT dataset is publicly available on the website 1 .
               
Click one of the above tabs to view related content.