With the fast development of Internet, many fields have accumulated great amount of data. And plenty of them are organized in heterogeneous information network (HIN), so analyzing HIN efficiently is… Click to show full abstract
With the fast development of Internet, many fields have accumulated great amount of data. And plenty of them are organized in heterogeneous information network (HIN), so analyzing HIN efficiently is very necessary. Several HIN clustering algorithms have been proposed in recent years. Most of them are based on meta-path. As instances of meta-paths can connect all the target objects directly, while clustering, all these algorithms only focus on the relationship of two target objects that are connected directly by an instance of meta-paths. In this situation, information contained in the relationship of two target objects that are not directly connected by an instance of meta-paths is neglected. These target object pairs may be very helpful for obtaining better clustering result. In order to take these target object pairs into consideration, a structural neighbor searching method is applied into the proposed algorithm. By using this method, all the indirectly connected neighbors of target objects are considered while performing clustering. Moreover, as more than one meta-paths can be found in a heterogeneous information network and each meta-path can impact the clustering result in different degree, a weight value should be assigned to each meta-path. These weight values are used to represent the relative importance of meta-paths. To obtain these weight values, a calculation method is proposed and the algorithm described in this paper tries to calculate weight values by using this method. With weight values of meta-paths and structural neighbors of target objects, all the vectors of target objects can be calculated. Then the hierarchical clustering procedure will be performed based on these vectors.
               
Click one of the above tabs to view related content.