LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Key Concept Identification: A Sentence Parse Tree-Based Technique for Candidate Feature Extraction From Unstructured Texts

The effectiveness of automatic key concept or keyphrase identification from unstructured text documents mainly depends on a comprehensive and meaningful list of candidate features extracted from the documents. However, the… Click to show full abstract

The effectiveness of automatic key concept or keyphrase identification from unstructured text documents mainly depends on a comprehensive and meaningful list of candidate features extracted from the documents. However, the conventional techniques for candidate feature extraction limit the performance of keyphrase identification algorithms and need improvement. The objective of this paper is to propose a novel parse tree-based approach for candidate feature extraction to overcome the shortcomings of the existing techniques. Our proposed technique is based on generating a parse tree for each sentence in the input text. Sentence parse trees are then cut into sub-trees to extract branches for candidate phrases (i.e., noun, verb, and so on). The sub-trees are combined using parts-of-speech tagging to generate the flat list of candidate phrases. Finally, filtering is performed using heuristic rules and redundant phrases are eliminated to generate final list of candidate features. Experimental analysis is conducted for validation of the proposed scheme using three manually annotated and publicly available data sets from different domains, i.e., Inspec, 500N-KPCrowed, and SemEval-2010. The proposed technique is fine-tuned to determine the optimal value for the parameter context window size and then it is compared with the existing conventional n-gram and noun-phrase-based techniques. The results show that the proposed technique outperforms the existing approaches and significant improvements of 13.51% and 30.67%, 12.86% and 5.48%, and 13.16% and 31.46% are achieved, in terms of precision, recall, and F-measure when compared with noun-phrase-based scheme and n-gram-based scheme, respectively. These results give us confidence to further validate the proposed technique by developing a keyphrase extraction algorithm in the future.

Keywords: technique; candidate; parse tree; candidate feature; feature extraction

Journal Title: IEEE Access
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.