LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

CHARM: An Improved Method for Chinese Precoding and Character-Level Embedding

Photo by boliveira from unsplash

The numerical transformation of text is a key step in natural language processing tasks, among which the word embedding model is the most representative. However, the word embedding model is… Click to show full abstract

The numerical transformation of text is a key step in natural language processing tasks, among which the word embedding model is the most representative. However, the word embedding model is insufficient in representing unregistered and low-frequency words, and the character-level embedding model makes up for it. Most Chinese character-level models focus on the independent use of Chinese character features such as strokes, radicals, and pinyin, or the shallow correlations between some features, while the inherent correlations among different features such as pronunciation, glyph, stroke order, and word frequency are not fully utilized. Through the statistical analyses of various features of Chinese characters, this paper proposes a precoding method based on Character Helix Alternative Representation Model (CHARM), which can realize the reversible mapping of Chinese characters or words to English-like sequences, and the advantage of this method is verified in three tasks: text classification, named entity recognition and machine translation. Experimental results on several test sets show that the model performs well, and can be a replacement character-level corpus for the original Chinese text.

Keywords: charm; level; character level; model; level embedding

Journal Title: IEEE Access
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.