Semi-supervised clustering is one of important research topics in cluster analysis, which uses pre-given knowledge as constraints to improve the clustering performance. While clustering a data set, people often get… Click to show full abstract
Semi-supervised clustering is one of important research topics in cluster analysis, which uses pre-given knowledge as constraints to improve the clustering performance. While clustering a data set, people often get prior constraints from different information sources, which may have different representations and contents, to guide clustering process. However, most of existing semi-supervised clustering algorithms are based on single-source constraints and rarely consider to integrate multi-source constraints to enhance the clustering quality. To solve the problem, we analyze the relations among different types of constraints and propose an uniform representation for them. Based it, we propose a new semi-supervised clustering algorithm to find out a clustering that has good cluster structure and high consensus of all the sources of constraints. In the algorithm, we construct an optimization objective model and its solution method to achieve the aim. This algorithm can integrate multi-source constraints well to reduce the effect of incorrect constraints from single sources and find out a high-quality clustering. By the experimental studies on several benchmark data sets, we illustrate the effectiveness of the proposed algorithm, compared to other semi-supervised clustering algorithms.
               
Click one of the above tabs to view related content.