BACKGROUND Gene set analysis is a popular approach to examine the association between a predefined gene set and a phenotype. Few methods have been developed for a continuous phenotype. However,… Click to show full abstract
BACKGROUND Gene set analysis is a popular approach to examine the association between a predefined gene set and a phenotype. Few methods have been developed for a continuous phenotype. However, often not all the genes within a significant gene set contribute to its significance. There is no gene set reduction method developed for continuous phenotype. We developed a computationally efficient analytical tool, called linear combination test for gene set reduction (LCT-GSR) to identify core subsets of gene sets associated with a continuous phenotype. Identifying the core subset enhances our understanding of the biological mechanism and reduces costs of disease risk assessment, diagnosis and treatment. RESULTS We evaluated the performance of our analytical tool by applying it to two real microarray studies. In the first application, we analyzed pathway expression measurements in newborns' blood to discover core genes contributing to the variation in birth weight. On average, we were able to reduce the number of genes in the 33 significant gene sets of embryonic stem cell signatures by 84.3% resulting in 229 unique genes. Using immunologic signatures, on average we reduced the number of genes in the 210 significant gene sets by 89% leading to 1603 unique genes. There were 180 unique core genes overlapping across the two databases. In the second application, we analyzed pathway expression measurements in a cohort of lethal prostate cancer patients from Swedish Watchful Waiting cohort to identify main genes associated with tumor volume. On average, we were able to reduce the number of genes in the 17 gene sets by 90% resulting in 47 unique genes. CONCLUSIONS We conclude that LCT-GSR is a statistically sound analytical tool that can be used to extract core genes associated with a continuous phenotype. It can be applied to a wide range of studies in which dichotomizing the continuous phenotype is neither easy nor meaningful. Reduction to the most predictive genes is crucial in advancing our understanding of issues such as disease prevention, faster and more efficient diagnosis, intervention strategies and personalized medicine.
               
Click one of the above tabs to view related content.