Along with the emergence of Web 2.0, User Generated Content (UGC) is becoming increasingly important for knowledge sharing. Wikipedia being the world’s largest-ever community-based collaborative encyclopedia, is also one of… Click to show full abstract
Along with the emergence of Web 2.0, User Generated Content (UGC) is becoming increasingly important for knowledge sharing. Wikipedia being the world’s largest-ever community-based collaborative encyclopedia, is also one of the biggest UGC databases in the world. Wikipedia is dealing with a significant problem of Information Quality (IQ) because of its open-source and collaborative nature. When carrying out attacks such as link spamming, malicious users take advantage of Wikipedia’s popularity on the WWW. As a result, Wikipedia is generally not recommended for academic-related work. There are, however, some articles that are both rich in information and quality. Existing approaches for assessing Wikipedia’s IQ involve statistical models and machine learning algorithms; however, the existing models do not produce satisfactory results. In this study, a novel theoretical model based on Google’s E-A-T framework is introduced to assess Wikipedia’s IQ. The model comprises three IQ constructs Expertise, Authority and Trustworthiness. Based on the empirical findings and study results, a set of IQ dimensions that influence the above three IQ constructs, as well as 45 IQ attributes to measure the IQ dimensions, were identified. The IQ attributes were automatically and inexpensively extracted from the content and meta-data statistics of Wikipedia articles using a Selenium 3.14 web automation script. A sample of 2000 articles comprising 1000 Featured Articles (FA) and 1000 non-FA articles from six WikiProjects was used for the data analysis. The proposed model was compared with three previously published models in terms of classification and clustering accuracy. It received classification and clustering accuracies of 95% and 93% respectively which is a drastic improvement over the existing models. Furthermore, an average inter-rater agreement of 84% was observed. Thus, the proposed model’s effectiveness is fairly validated by this extensive experiment. This study contributes to the related knowledge area by introducing a novel framework to assess Wikipedia articles’ IQ. The study’s limitations include the domain specificity of the chosen dataset and focusing solely on the English language. However, the results can be generalized by improving the dataset by size and replicating the study for the other domains and languages supported by Wikipedia.
               
Click one of the above tabs to view related content.