K-means is a fundamental clustering algorithm widely used in both academic and industrial applications. Its popularity can be attributed to its simplicity and efficiency. Studies show the equivalence of K-means… Click to show full abstract
K-means is a fundamental clustering algorithm widely used in both academic and industrial applications. Its popularity can be attributed to its simplicity and efficiency. Studies show the equivalence of K-means to principal component analysis, non-negative matrix factorization, and spectral clustering. However, these studies focus on standard K-means with squared Euclidean distance. In this review paper, we unify the available approaches in generalizing K-means to solve challenging and complex problems. We show that these generalizations can be seen from four aspects: data representation, distance measure, label assignment, and centroid updating. As concrete applications of transforming problems into modified K-means formulation, we review the following applications: iterative subspace projection and clustering, consensus clustering, constrained clustering, domain adaptation, and outlier detection.
               
Click one of the above tabs to view related content.