Abstract We introduce a method for measuring the quantity of online content of a set of languages at domain level. This measurement is used for building a Multi-Lingual Information Retrieval… Click to show full abstract
Abstract We introduce a method for measuring the quantity of online content of a set of languages at domain level. This measurement is used for building a Multi-Lingual Information Retrieval (MLIR) system that identifies which languages are strongly represented on the internet about a specific query topic. The system architecture includes two modules; the off-line module builds a linguistic diversity index for languages at topic level and the on-line module, where the suitable language for search is identified based the index for retrieving the relevant documents to the user query in that language. The conducted experiments explore the usefulness of building such an index and its usage effect on both of monolingual and traditional MLIR system. From the obtained results, it has been proven that the more internet resources, the better the accuracy of the retrieved results, and therefore the better the system performance.
               
Click one of the above tabs to view related content.