LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization

Photo by shutter_speed_ from unsplash

This study proposes a novel scheme for termset weighting based on cardinality statistics. Specifically, termsets are evaluated by considering the number of apparent member terms. Based on a recently verified… Click to show full abstract

This study proposes a novel scheme for termset weighting based on cardinality statistics. Specifically, termsets are evaluated by considering the number of apparent member terms. Based on a recently verified hypothesis that the occurrence of a subset of terms may also transfer worthwhile information about class memberships, the existing term weighting schemes are adapted. Here, the weight of a given termset is computed as the product of two factors. The first is a function of the member term frequencies that exist in the given document, and the second takes into account the numbers of positive and negative training documents in which the same number of members appear. By assigning a non-zero weight to the termsets when a subset of the member terms appears, the discriminative ability of different member term subsets is taken into consideration.

Keywords: termset weighting; term; weighting schemes; term weighting; cardinality statistics; termset

Journal Title: Applied Intelligence
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.