LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

Photo from wikipedia

The performance of text classification can be affected by the choice of appropriate term weighting scheme as well as other parameters. The terminology supervised term weighting scheme has become popular… Click to show full abstract

The performance of text classification can be affected by the choice of appropriate term weighting scheme as well as other parameters. The terminology supervised term weighting scheme has become popular in recent years, as it may provide discriminative representation in vector space for text documents belonging to different classes. A term weighting scheme generally consists of three factors, namely term frequency factor, collection frequency factor, and length normalization factor. The researchers mostly have been focused on developing new collection frequency factors in term weighting studies. However, the term frequency factor has an important role, especially in supervised term weighting. In this study, we extensively analyzed the effects of using different term frequency factors on seven supervised term weighting schemes. While six of these supervised term weighting schemes were applied in the previous studies in the literature, we derived one of them from an existing feature selection method and it was not used as a weighting method before. This analysis is performed using SVM and Roccio classifiers on two widely known benchmark datasets with different characteristics. Experimental results showed that modification of term frequency factor in supervised term weighting schemes increased the performance of almost all weighting schemes. Also, term weighting schemes using square root function-based term frequency factor (SQRT_TF) are more successful than the ones using term frequency (TF) and logarithmic function-based term frequency (LOG_TF) factors. TF term frequency factor seems as the least effective one among three different term frequency factors according to the experimental results and statistical analysis.

Keywords: term; term frequency; frequency; term weighting; frequency factor

Journal Title: Arabian Journal for Science and Engineering
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.