Many studies take sampling to apply fuzzy C-means (FCM) to very large data sets. However, comparing with the plenty algorithms for sampling-based FCM, few research works theoretically study the generalization… Click to show full abstract
Many studies take sampling to apply fuzzy C-means (FCM) to very large data sets. However, comparing with the plenty algorithms for sampling-based FCM, few research works theoretically study the generalization error. In this article, we take the concept probably approximately correct learning to study it, and define the generalization error as the maximum difference between the empirical risks and the real risks of solutions in solution space. First, we analyze this generalization error under finite solution space, and prove two theorems by Hoeffding's inequality, where one of them relies on a reasonable hypothesis. Then, we discuss the situation in which the solution space is infinite, and propose a theorem and a corollary by another hypothesis, where the results under finite solution space are used in the proofs. Finally, we bound the generalization error from the perspective of the FCM algorithm's convergence, where we take Taylor expansion to transform the risk function to the linear form and estimate the upper bound of its Vapnik-Chervonenkis (VC) dimension. The hypotheses proposed in this article are all intuitive and common phenomena in practice. Our results show the upper bound of the generalization error under the given minimum probability, which can offer insight into the stability of sampling-based FCM and can guide in its application.
               
Click one of the above tabs to view related content.