"Learning from data streams and class imbalance"

Photo by campaign_creators from unsplash

With the wide application of machine learning algorithms to the real world, class imbalanceandconceptdrift havebecomecrucial learning issues. Applications in variousdomains such as riskmanagement, anomaly detection, fraud detection, software engineering, social media mining, and recommender systems are affected by both class imbalance and concept drift. Class imbalance happens when the data categories are not equally represented, i.e., at least one category isminority compared toother categories. It can cause learningbias towards the majority class and poor generalisation. Concept drift is a change in the underlying distribution of the problem and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance. The problem becomes particularly challengingwhen they occur simultaneously, due to the fact that one problem can affect the treatment of the other. This special issue is composed of four high-quality papers studying the challenges andproposing new solutions for learning fromdata effectively and efficiently in the presence of class imbalance or concept drift. Resampling is the most popular set of techniques to overcome class imbalance in data. One-class learning methods can also be effective on imbalanced data. In the paper “Learning in Presence of Class Imbalance and Class Overlapping by Using One-class SVM and Undersampling Technique”, the authors proposed to use both undersampling and oneclass learning to preprocess data. One-class SVM is used to detect overlapping regions, and Tomek-link is used to further clean up the overlapping data and balance the data set. The proposed method is compared with other six state-of-the-art methods over seven binary and twomulti-class data sets, showingbetter accuracy onminority classeswithout harming majority-class accuracy. Different fromdata-level techniques and cost-sensitivemethods, the paper “AWeighted Pattern Matching Approach for Classification of Imbalanced Data with a Fireworks Based Algorithm for Feature Selection” proposed a novel weighted pattern matching method to classify imbalanced data, combining fireworks algorithms to select the best set of features for learning. The experiment was performed on 44 binary and 15multi-class data sets with class imbalance difficulty. The proposedmethod showed competitive performance in comparison with other state-of-the-art methods. As mentioned earlier, the class imbalance issue exists in many real-world applications. In the paper “Semantic Segmentation of High Resolution Remote Sensing Images Using Fully Convolutional Networkwith Adaptive Threshold”, the class imbalance issue in semantic segmentation is studied. Semantic segmentation is a multi-class classification problem in remote sensing. To overcome class imbalance, a fully convolutional neural network with an adaptive threshold of the Jaccard index is proposed. The experimental results showed superior classification performance on remote sensing images.

Keywords: concept drift; data streams; class; class imbalance; learning data

Journal Title: Connection Science
Year Published: 2019

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended