"Clustering of mixed datasets using deep learning algorithm"

Abstract The performance of a clustering algorithm is highly dependent on the quality and quantity of the training dataset. Deep learning is one of the most popular and successful technique for clustering of datasets with high quality. Typically, most of the datasets contain mixed numeric and categorical data attributes. The clustering of such different types of data is a complex issue. Deep learning methods, the state-of-the-art classifiers, with better learning procedures and computational resources, can fill these gaps. To improve the robustness of clusters, we propose a Constraint-Based Deep Convolutional Generative Adversarial Network (CB-DCGANs) framework for generating simulated data to augment the training set to improve the performance of the clustering algorithm. We evaluated the performance of an end-to-end Deep Convolutional Neural Network (DCNN) in detecting the clusters from given datasets. The results from CB-DCGANs with DCNN yielded baseline accuracies of 0.8853 for heart disease dataset. In chemoinformatics datasets proposed algorithm yielded accuracies of 0.965 for kaggle dataset, 0.987 for factors dataset, 0.952 for kinase dataset. This study shows that using generative adversarial networks for clustering augmentation can significantly improve performance, especially in real-life applications.

Keywords: algorithm; mixed datasets; deep learning; datasets using; performance; clustering mixed

Journal Title: Chemometrics and Intelligent Laboratory Systems
Year Published: 2020

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended