Fine-grained Named entity recognition (NER) is crucial to natural language processing (NLP) applications like relation extraction and knowledge graph construction. Most existing fine-grained NER systems suffer from inefficiency problem as… Click to show full abstract
Fine-grained Named entity recognition (NER) is crucial to natural language processing (NLP) applications like relation extraction and knowledge graph construction. Most existing fine-grained NER systems suffer from inefficiency problem as they use manually annotated training datasets. To address such issue, our NER system could automatically generate datasets from Wikipedia in distant supervision paradigm through mapping hyperlinks in Wikipedia documents to Freebase. In addition, previous NER models can not effectively process fine-grained labels with more than 100 types. So we introduce a ‘BIO’ tagging strategy which can identify the position and type attributes simultaneously. Such tagging scheme transfers NER problem into a sequence-to-sequence (seq2seq) based issue. We propose a seq2seq framework to comprehend the input sentence in a comprehensive way. Specifically, we adopt a Bi-LSTM as the encoder to equally process the past and future information of the input. Then we add a self-attention mechanism to handle the long-term dependency problem in a long sequence. When classifying the entity tags, we choose CRF model as it adds more constraints to avoid position logical problem. Experiments are performed on large-scale datasets for fine-grained NER tasks. Experimental results verify the effectiveness of FSeqC, and it outperforms other state-of-the-art alternatives consistently and significantly.
               
Click one of the above tabs to view related content.