Abstract Ensemble clustering has emerged as a powerful tool for improving the stability and accuracy of the clustering task. Although various approaches have been proposed for improving the performance of… Click to show full abstract
Abstract Ensemble clustering has emerged as a powerful tool for improving the stability and accuracy of the clustering task. Although various approaches have been proposed for improving the performance of algorithms, most of them ignored two crucial messages provided by base clusterings. First, some samples of input data may be outliers that locate the boundary of the clusters and can be easily partitioned into different clusters. Second, must-link information exists amongst some instances. In this paper, we develop a novel ensemble method that utilizes a dense representation model to construct a pairwise similarity matrix, and further obtain the final ensemble clusterings result via Ncut. In particular, a robust loss function is used in the proposed model, which can weaken the effect caused by outliers. As the model is convex but non-smooth, we propose a customized re-weighted optimization method and theoretically prove that the final solution provided by it is the global optimal solution for the original problem. Furthermore, by analysing the particular structure of input clusterings, we introduce a slimming strategy, which can utilize the must-link information amongst instances to reduce the size of input data and further reduce the time cost for constructing similarity matrix. Numerous experimental results on real datasets demonstrate the advantages of proposed method over the state-of-the-art algorithms.
               
Click one of the above tabs to view related content.