With the growing internet, web spam is also increasing, which majorly affect the user experiences with search engines. Web spam methods target the search engine’s internal programs to push targeted… Click to show full abstract
With the growing internet, web spam is also increasing, which majorly affect the user experiences with search engines. Web spam methods target the search engine’s internal programs to push targeted web sites at the upper positions. This paper proposed an intelligent oversampling approach based upon general type-2 fuzzy sets to balance the distribution and hence enhance the classification performance for web spam detection. The proposed method is validated with the real-world benchmark dataset, WEBSPAM-UK 2007, and its performance is assessed with AUC (Area under the ROC curve), F-measure, and G-mean. It is compared with SMOTE in combination with 11 well-known base classifiers available with WEKA Tool. The computational complexity of the proposed method is the same as that of SMOTE. It is reported that when the proposed method is combined with the base classifiers, it boosts up the classifier’s performance and outperforms SMOTE in every case. Proposed combinations are also statistically analyzed using Friedman, Holm, and Wilcoxon test to know the best combination among the 11 base classifiers. It is evident from the analysis that the proposed method, in combination with random forest (GT2FS-SMOTE+RF), performed best among every other combination.
               
Click one of the above tabs to view related content.