Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering, and protein complex detection. The objective of the… Click to show full abstract
Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering, and protein complex detection. The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points. While most prior works focus on a setting in which the number of data points involved in a measurement is two, this paper explores a generalized setting in which the number can be more than two. Motivated by applications particularly in machine learning and channel coding, we consider two types of measurements: 1) homogeneity measurement that indicates whether or not the associated data points belong to the same community and 2) parity measurement that denotes the modulo-2 sum of the values of the data points. Such measurements are possibly corrupted by Bernoulli noise. We characterize the fundamental limits on the number of measurements required to reconstruct the communities for the considered models.
               
Click one of the above tabs to view related content.