Class imbalance problem that characterized with the skew distribution towards the majority arises as one challenge in recent years. Many oversampling techniques have been proposed to cope with this problem… Click to show full abstract
Class imbalance problem that characterized with the skew distribution towards the majority arises as one challenge in recent years. Many oversampling techniques have been proposed to cope with this problem and some of them combine the oversampling procedure with the clustering algorithm which guaranteeing new synthetic samples being generated in clusters. However far-away samples but with the same minority sub-region are generally clustered into different groups owing to the characteristic of clustering algorithm itself. Therefore, the following oversampling procedure is mostly carried in incomplete minority sub-regions that synthetic samples not well cover the integral minority region. And to our best knowledge, none of existing algorithm is designed to directly estimate minority sub-regions for class imbalance problem. Thus, one new grouping algorithm, named Direction Distribution-based Minority Sub-region Estimation (DDMSE), is first proposed. The new algorithm exploits the intuitive observation, that the minority with the same sub-region almost distribute within the same direction when compared to other majority, to estimate minority sub-regions that tactfully ignoring negative impacts brought by the distance factor like in clustering algorithms. Finally, new synthetic samples are generated in those minority sub-regions. And experimental results on real-world datasets show the comparable performance with other state-of-the-art oversampling methods.
               
Click one of the above tabs to view related content.