LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

An MRC and adaptive positive‐unlabeled learning framework for incompletely labeled named entity recognition

Photo from wikipedia

Currently, named entity recognition (NER) is mainly evaluated on standard and well‐annotated data sets. However, the construction of a well‐annotated data set will consume a lot of manpower and time.… Click to show full abstract

Currently, named entity recognition (NER) is mainly evaluated on standard and well‐annotated data sets. However, the construction of a well‐annotated data set will consume a lot of manpower and time. In lots of applications of NER, data sets may contain a lot of noise, and a large part of noise comes from unlabeled entities. At present, the training process of most models treat unlabeled entities as nonentities, which causes these models to lean toward predicting most words of an input context as nonentities and greatly affects their performances. In this paper, as the first attempt, we innovatively propose an adaptive positive‐unlabeled (adaPU) learning technology, and integrate the adaPU into a machine reading comprehension (MRC) framework for NER, which can still perform well on data sets with a large proportion of unlabeled entities. In our framework, to leverage the above problem that a model may predict most words of an input context as nonentities, we propose an adaPU learning technology by adjusting a loss coefficient of positive and negative samples. Moreover, instead of just constructing a fixed query for each entity type as input to MRC, we propose a new method of dynamically constructing multiple queries for each entity type, which also brings slight performance improvement for NER. Accordingly, we explore new training and entity inference strategies for our learning framework. The experimental results show that our framework is effective on data sets that contain a large number of unlabeled entities. When the proportion of unlabeled entities reaches 50%, our framework still can keep from losing effectiveness and maintain more than 80 F1‐scores on several data sets. Also, the experiments show that our framework can achieve better or competitive performance on standard data sets. The ablation experiments further fully demonstrate our MRC framework with adaPU learning and dynamic query construction method can improve the performance of NER.

Keywords: entity recognition; entity; framework; unlabeled entities; named entity; data sets

Journal Title: International Journal of Intelligent Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.