LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A Thorough Evaluation of Distance-Based Meta-Features for Automated Text Classification

Photo by nspm from unsplash

We address the problem of automatically learning to classify texts by exploiting information derived from meta-features, i.e., features derived from the original bag-of-words representation. Specifically, we provide an in-depth analysis… Click to show full abstract

We address the problem of automatically learning to classify texts by exploiting information derived from meta-features, i.e., features derived from the original bag-of-words representation. Specifically, we provide an in-depth analysis on the recently proposed distance-based meta-features, a data engineering technique that relies on the distance between documents to transform the original feature space into a new one, potentially smaller and more informed. Despite its potential, the meta-feature space may be unnecessarily complex and highly dimensional, which increases the tendency of overfitting, limits the application of meta-features in different contexts, and increases computational costs. In this work, we propose the use of multi-objective strategies to reduce the number of meta-features while maximizing the classification effectiveness, when considering the adequacy of the selected meta-features to a particular dataset or classification method. We present effective and efficient proposals for meta-feature selection that can substantially reduce the number of meta-features by up to 89 percent while keeping or improving the classification effectiveness, something not possible with any of the evaluated baselines. We also use our selection strategies as evaluation tools to analyze different combinations of meta-features. We found very compact combinations of meta-features that can achieve high classification effectiveness in most datasets, despite their peculiarities.

Keywords: meta features; classification; distance based; meta; based meta

Journal Title: IEEE Transactions on Knowledge and Data Engineering
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.