Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for… Click to show full abstract
Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However the studies for Turkish, which is a morphologically richer and lesser-studied language, have fallen behind these for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to discover named entities occurring in Web datasets initiated many studies in this field. This article presents the enhancements made to a Turkish named entity recognition model [5] (based on conditional random fields (CRFs) and originally tailored for well formed texts) in order to extend its covered named entity types, and also to process extra challenging user generated content coming with Web 2.0. The article introduces the re-annotation of the available datasets to extend the covered named entity types, and a brand new dataset from Web 2.0. The introduced approach reveals an exact match F1 score of 92% on a dataset collected from Turkish news articles and ∼65% on different datasets collected from Web 2.0.
               
Click one of the above tabs to view related content.