"Efficient distributed clustering using boundary information"

Abstract In the era of big data, it is increasingly common that large amount of data is generated across multiple distributed sites and cannot be gathered into a centralized site… Click to show full abstract

Abstract In the era of big data, it is increasingly common that large amount of data is generated across multiple distributed sites and cannot be gathered into a centralized site for further analysis, which invalidates the assumption of traditional clustering techniques based on centralized models. The major challenge is that these distributed datasets cannot be trivially merged due to issues such as privacy concerns, limited network bandwidth among sites and limited computational capacity of a single site. To tackle this challenge, we propose an efficient distributed clustering scheme using boundary information (DCUBI), which features good flexibility and scalability. The main procedure of DCUBI consists of three steps: local-global-local. Firstly, each local site extracts the boundary points from its own local data and applies traditional clustering on boundary points only. Secondly, labeled boundary points from each site are sent to the central site as local representatives where boundary and cluster fusion is conducted to form the global clustering model. Finally, the global boundary and cluster information is sent back to each local site for refined local clustering. To demonstrate the effectiveness of DCUBI, we plug the well-known DBSCAN algorithm into DCUBI and comprehensive experiments are conducted using datasets with different properties. Experiment results clearly verify the quality of clustering by DCUBI as well as its superior time efficiency when the volume of data in each site is large. Furthermore, other popular clustering techniques especially those with high time complexity such as spectral clustering and affinity propagation clustering are also plugged into DCUBI to demonstrate the flexibility of the proposed scheme.

Keywords: efficient distributed; information; using boundary; dcubi; distributed clustering; site

Journal Title: Neurocomputing
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended