Semi-supervised clustering (SSC) aims to improve clustering performance with the support of prior knowledge (i.e., side information). Compared with pairwise constraints, the partial labeling information is more natural to characterize… Click to show full abstract
Semi-supervised clustering (SSC) aims to improve clustering performance with the support of prior knowledge (i.e., side information). Compared with pairwise constraints, the partial labeling information is more natural to characterize the data distribution in a high level. However, the natural gap between the class information and the clustering is not adequately taken into account in exiting SSC methods when utilizing partial labeling information to guide the clustering procedure. In order to address this problem, we present a “compact-cluster” assumption for SSC to utilize the partial labeling information via a cluster-splitting technique. Based on this assumption, a general framework, CSSC, is proposed to supervise the traditional clustering with an objective function which is defined by incorporating an item to measure the compact degree of clusters. Furthermore, we provide two effective solutions for Kmeans and spectral clustering within the CSSC framework and derive the corresponding algorithms to seek the optimum number of clusters and their centroids. Corresponding theoretical analyses demonstrate the feasibility and effectivity of the proposed method. Finally, the extensive experiments on eight real-world datasets demonstrate the superiority of our method over other state-of-the-art SSC methods.
               
Click one of the above tabs to view related content.