Abstract In Information Retrieval systems, stemming handles the words that can occur in different morphological forms, and hence matches the terms of the documents and the queries that are related… Click to show full abstract
Abstract In Information Retrieval systems, stemming handles the words that can occur in different morphological forms, and hence matches the terms of the documents and the queries that are related in meanings. In this article, we have proposed a cognitive inspired language-independent stemming that learns group of morphologically related words from the ambient corpus without any linguistic knowledge or human intervention and it behaves in a way the human brain works. The main idea of our proposed algorithm is to determine only those variants of the words from the ambient corpus that match the original intent of the query terms. We conducted ad-hoc retrieval experiments in a number of languages of varying morphological complexity using standard TREC, FIRE, and CLEF document collection. The results indicate that stemming improves the retrieval accuracy and the effectiveness of stemming algorithm increases with the increase in the morphological complexity of algorithm. The results also indicates that the performance of our proposed algorithm is better than the stemmers based on linguistic knowledge and other state-of-the-art statistical stemmers in almost all the languages under study. In multi-lingual setup these results are quite encouraging.
               
Click one of the above tabs to view related content.