Abstract Motivation Multilevel molecular profiling of tumors and the integrative analysis with clinical outcomes have enabled a deeper characterization of cancer treatment. Mediation analysis has emerged as a promising statistical… Click to show full abstract
Abstract Motivation Multilevel molecular profiling of tumors and the integrative analysis with clinical outcomes have enabled a deeper characterization of cancer treatment. Mediation analysis has emerged as a promising statistical tool to identify and quantify the intermediate mechanisms by which a gene affects an outcome. However, existing methods lack a unified approach to handle various types of outcome variables, making them unsuitable for high-throughput molecular profiling data with highly interconnected variables. Results We develop a general mediation analysis framework for proteogenomic data that include multiple exposures, multivariate mediators on various scales of effects as appropriate for continuous, binary and survival outcomes. Our estimation method avoids imposing constraints on model parameters such as the rare disease assumption, while accommodating multiple exposures and high-dimensional mediators. We compare our approach to other methods in extensive simulation studies at a range of sample sizes, disease prevalence and number of false mediators. Using kidney renal clear cell carcinoma proteogenomic data, we identify genes that are mediated by proteins and the underlying mechanisms on various survival outcomes that capture short- and long-term disease-specific clinical characteristics. Availability and implementation Software is made available in an R package (https://github.com/longjp/mediateR). Supplementary information Supplementary data are available at Bioinformatics online.
               
Click one of the above tabs to view related content.