Genetic association study (GAS) is a promising tool for detecting and analyzing the cause of complex diseases. The extreme learning machine (ELM) has been successfully applied in a variety of… Click to show full abstract
Genetic association study (GAS) is a promising tool for detecting and analyzing the cause of complex diseases. The extreme learning machine (ELM) has been successfully applied in a variety of research fields. Yet, as a black box method, it could not measure up to the task of GAS by itself, because it cannot tell us what causes the diseases, which is very crucial to the biologists. In this paper, we propose an ELM-based statistically significant pattern classification framework, which combines ELM with feature vector-based methods to solve the GAS problem efficiently and effectively. In particular: 1) a statistically significant pattern considering in terms of both family wise error rate (FWER) and false discovery rate (FDR) is proposed to control false positives in multiple hypothesis tests, which is necessary in GAS, but ignored by most of the existing methods; 2) an upper bound of the significance of a pattern is deduced to speed up FWER-constrained statistically significant pattern mining in a row enumeration way. Further, a space-effective grid index is devised to dramatically improves the efficiency of FDR-constrained pattern discovery; and 3) an ELM classifier is constructed based on the significant patterns. Comprehensive empirical studies on four real genotype datasets demonstrate much higher efficiency and effectiveness of our proposed framework with respect to the compared methods.
               
Click one of the above tabs to view related content.