This paper presents ContextMiner, a novel natural language processing (NLP) framework to automatically capture contextual features for the purpose of extracting meaningful context-aware phrases from cybersecurity unstructured textual data. The… Click to show full abstract
This paper presents ContextMiner, a novel natural language processing (NLP) framework to automatically capture contextual features for the purpose of extracting meaningful context-aware phrases from cybersecurity unstructured textual data. The framework utilizes basic attributes such as part-of-speech tagging, dependency parsing, and a domain-specific grammar to extract the contextual features. The effectiveness and applications of ContextMiner are evaluated and presented from two different perspectives: qualitative and quantitative. As for the qualitative analysis, our case studies show that the proposed framework is capable of retrieving additional contents from the given texts, both in a labeled and unlabeled setting, and thus building context-aware phrases in comparison with existing approaches. From a quantitative point of view, we evaluate ContextMiner as a pre-processing step to perform named entity recognition (NER). Our results show that ContextMiner reduces the corpus up to 70% while maintaining 85% of its relevant entities, with a small drop in the classification metrics. Finally, we explored the utilization of ContextMiner in the construction and reasoning of knowledge graphs.
               
Click one of the above tabs to view related content.