Writing essays and technical documents can be a challenging task for many people, especially for non-native speakers. Good content and ideas are both important in writing, but clear and effective… Click to show full abstract
Writing essays and technical documents can be a challenging task for many people, especially for non-native speakers. Good content and ideas are both important in writing, but clear and effective expressions that can accurately convey the meaning of these ideas to the readers are essential for good writing. Many writers often face difficulty in selecting the proper words that would fit into their sentences. Proper words may be widely used words that appear in similar contexts. These can be identified by a statistical analysis of a corpus, which is a collection of a large number of sentences. This paper propses a method that can recommend suitable words based on word pattern queries, which are expressed as a combination of words, part-of-speech (POS) tags, and wild card words, such as ‘ {1:2} idea.’ The proposed method enables to recommend some words for the POS tags of a word pattern query, along with their popularity and example sentences in a corpus. To facilitate such query processing, the method first conducts the POS tagging for all the sentences in a corpus. From the tagged sentences, it generates the 2-grams up to 5-grams, which consist of words, POS tags, and the special wild card word symbol ‘*’. It then builds an inverted file-like data structure which keeps the relevant information for each potential word pattern from the n-grams. Due to the large number of word patterns and sentences, the MapReduce algorithms are developed to realize the proposed method and HBase are deployed to manage the inverted file-like data structure. Some experiment results are presented to show the characteristics of the proposed method.
               
Click one of the above tabs to view related content.