Abstract There are many high-dimensional multiview data in various big data applications. It is very difficult to deal with those high-dimensional multiview data for the classic clustering algorithms, which consider… Click to show full abstract
Abstract There are many high-dimensional multiview data in various big data applications. It is very difficult to deal with those high-dimensional multiview data for the classic clustering algorithms, which consider all features of data with equal relevance. To tackle this challenging problem, this paper aims at proposing a novel intelligent weighting k-means clustering (IWKM) algorithm based on swarm intelligence. Firstly, the degree of coupling between clusters is presented in the model of clustering to enlarge the dissimilarity of clusters. Various weights of views and features are used in the weighting distance function to determine the clusters of objects. Secondly, to eliminate the sensitivity of initial cluster centers, swarm intelligence is utilized to find initial cluster centers, weights of views, and weights of features by a global search. Lastly, a precise perturbation is proposed to improve optimization performance of swarm intelligence. To verify the performance of clustering for high-dimensional multiview data, the experiments were performed by the evaluation metrics of Rand Index, Jaccard Coefficient and Folkes Russe in five big data applications on the two different computational platforms of apache spark and single node. The experimental results show that IWKM is effective and efficient in clustering of high-dimensional multiview data, and can obtain better performance than the other 5 kinds of approaches in these complicated data sets with more views and higher dimensions on apache spark and single node.
               
Click one of the above tabs to view related content.