This paper focuses on the word sense disambiguation (WSD) problem in the context of Urdu language. Word sense disambiguation (WSD) is a phenomena for disambiguating the text so that machine… Click to show full abstract
This paper focuses on the word sense disambiguation (WSD) problem in the context of Urdu language. Word sense disambiguation (WSD) is a phenomena for disambiguating the text so that machine (computer) would be capable to deduce correct sense of individual given word(s). WSD is critical for solving natural language engineering (NLE) tasks such as machine translation and speech processing etc. It also increase the performance of other tasks such as text retrieval, document classification and document clustering etc. Research work in WSD has been conducted up to different extents in computationally developed languages of the world. In the context of Urdu language the NLE research in general and the WSD research in particular is still in the infancy stage due to the rich morphological structure of Urdu. In this paper, we use machine learning (ML) approaches such as Bayes net classifier (BN), support vector machine (SVM) and decision tree (DT) for WSD in native script Urdu text. The results shown that BN has better F-measure than SVM and DT. The maximum F-measure of 0.711 over 2.5 million words raw Urdu corpus was recorded for the Bayes net classifier.
               
Click one of the above tabs to view related content.