LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

PSPLPA: Probability and similarity based parallel label propagation algorithm on spark

Abstract With the rapid growth of social network, the cost of computation is increasing. Many existing algorithms are not suitable for the large-scale data. Apache Spark is an open-source cluster… Click to show full abstract

Abstract With the rapid growth of social network, the cost of computation is increasing. Many existing algorithms are not suitable for the large-scale data. Apache Spark is an open-source cluster computing framework that empowers us to solve the problem of community detection in a cluster of computer. In this paper, we propose a novel label propagation algorithm on Spark, called PSPLPA (Probability and similarity based Parallel label propagation algorithm). PSPLPA employs a new label updating strategy using probability in the label propagation procedure during each iteration. First, weight calculation, which is based on k-shell, is integrated into the label initialization process. Second, parallel propagation steps are comprehensively proposed to utilize label probability efficiently. Third, randomness in label updating is significantly reduced via automatic label selection and similarity computation. Experiments conducted on artificial and real social networks demonstrate that the proposed algorithm exhibits high scalability and high accuracy.

Keywords: label; label propagation; propagation algorithm; propagation; probability; spark

Journal Title: Physica A: Statistical Mechanics and its Applications
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.