Heterogeneous information networks (e.g. cloud service relation networks and social networks), where multiple-typed objects are interconnected, can be structured by big graphs. A major challenge for clustering in such big… Click to show full abstract
Heterogeneous information networks (e.g. cloud service relation networks and social networks), where multiple-typed objects are interconnected, can be structured by big graphs. A major challenge for clustering in such big graphs is the complex structures that can generate different results, carrying many diverse semantic meanings. In order to generate desired clustering, we propose a parallel clustering method for the heterogeneous information net-works on an efficient graph computation system (Spark). We use a multi-relation and path-based method to create similarity matrices, and implement our method based on graph computation model. It is inefficient to directly use existing data-parallel tools (e.g. Hadoop) for graph computation tasks, and some graph-parallel tools (e.g. Pregel) do not effectively address the challenges of graph construction and transformation. Therefore, we implemented our parallel method on the Spark system. The experiment results of clustering show our method is more accuracy.
               
Click one of the above tabs to view related content.