LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification.

Photo by campaign_creators from unsplash

Data imbalance is a common phenomenon in machine learning. In the imbalanced data classification, minority samples are far less than majority samples, which makes it difficult for minority to be… Click to show full abstract

Data imbalance is a common phenomenon in machine learning. In the imbalanced data classification, minority samples are far less than majority samples, which makes it difficult for minority to be effectively learned by classifiers A synthetic minority oversampling technique (SMOTE) improves the sensitivity of classifiers to minority by synthesizing minority samples without repetition. However, the process of synthesizing new samples in the SMOTE algorithm may lead to problems such as "noisy samples" and "boundary samples." Based on the above description, we propose a synthetic minority oversampling technique based on Gaussian mixture model filtering (GMF-SMOTE). GMF-SMOTE uses the expected maximum algorithm based on the Gaussian mixture model to group the imbalanced data. Then, the expected maximum filtering algorithm is used to filter out the "noisy samples" and "boundary samples" in the subclasses after grouping. Finally, to synthesize majority and minority samples, we design two dynamic oversampling ratios. Experimental results show that the GMF-SMOTE performs better than the traditional oversampling algorithms on 20 UCI datasets. The population averages of sensitivity and specificity indexes of random forest (RF) on the UCI datasets synthesized by GMF-SMOTE are 97.49% and 97.02%, respectively. In addition, we also record the G-mean and MCC indexes of the RF, which are 97.32% and 94.80%, respectively, significantly better than the traditional oversampling algorithms. More importantly, the two statistical tests show that GMF-SMOTE is significantly better than the traditional oversampling algorithms.

Keywords: synthetic minority; minority; imbalanced data; oversampling technique; minority oversampling; based gaussian

Journal Title: IEEE transactions on neural networks and learning systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.