LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Word recommendation for English composition using big corpus data processing

Photo from wikipedia

Writing essays and technical documents can be a challenging task for many people, especially for non-native speakers. Good content and ideas are both important in writing, but clear and effective… Click to show full abstract

Writing essays and technical documents can be a challenging task for many people, especially for non-native speakers. Good content and ideas are both important in writing, but clear and effective expressions that can accurately convey the meaning of these ideas to the readers are essential for good writing. Many writers often face difficulty in selecting the proper words that would fit into their sentences. Proper words may be widely used words that appear in similar contexts. These can be identified by a statistical analysis of a corpus, which is a collection of a large number of sentences. This paper propses a method that can recommend suitable words based on word pattern queries, which are expressed as a combination of words, part-of-speech (POS) tags, and wild card words, such as ‘ {1:2} idea.’ The proposed method enables to recommend some words for the POS tags of a word pattern query, along with their popularity and example sentences in a corpus. To facilitate such query processing, the method first conducts the POS tagging for all the sentences in a corpus. From the tagged sentences, it generates the 2-grams up to 5-grams, which consist of words, POS tags, and the special wild card word symbol ‘*’. It then builds an inverted file-like data structure which keeps the relevant information for each potential word pattern from the n-grams. Due to the large number of word patterns and sentences, the MapReduce algorithms are developed to realize the proposed method and HBase are deployed to manage the inverted file-like data structure. Some experiment results are presented to show the characteristics of the proposed method.

Keywords: processing; word pattern; word; pos tags; corpus; proposed method

Journal Title: Cluster Computing
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.