LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application

Photo by arnosenoner from unsplash

BackgroundGenotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively… Click to show full abstract

BackgroundGenotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively integrated in Genome-Wide Association Studies (GWAS) to help derive “causal” genetic risk factors and their interactions, which play critical roles in life and disease systems. Identifying “synergistic” interactions with respect to the outcome of interest can help accurate phenotypic prediction and understand the underlying mechanism of system behavior. Many statistical measures for estimating synergistic interactions have been proposed in the literature for such a purpose. However, except for empirical performance, there is still no theoretical analysis on the power and limitation of these synergistic interaction measures.ResultsIn this paper, it is shown that the existing information-theoretic multivariate synergy depends on a small subset of the interaction parameters in the model, sometimes on only one interaction parameter. In addition, an adjusted version of multivariate synergy is proposed as a new measure to estimate the interactive effects, with experiments conducted over both simulated data sets and a real-world GWAS data set to show the effectiveness.ConclusionsWe provide rigorous theoretical analysis and empirical evidence on why the information-theoretic multivariate synergy helps with identifying genetic risk factors via synergistic interactions. We further establish the rigorous sample complexity analysis on detecting interactive effects, confirmed by both simulated and real-world data sets.

Keywords: multivariate synergy; logistic regression; synergistic interactions; feature selection; selection interactions; interactions logistic

Journal Title: BMC Genomics
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.