As our understanding of the microbiome has expanded, so has the recognition of its critical role in human health and disease, thereby emphasizing the importance of testing whether microbes are… Click to show full abstract
As our understanding of the microbiome has expanded, so has the recognition of its critical role in human health and disease, thereby emphasizing the importance of testing whether microbes are associated with environmental factors or clinical outcomes. However, many of the fundamental challenges that concern microbiome surveys arise from statistical and experimental design issues, such as the sparse and overdispersed nature of microbiome count data and the complex correlation structure among samples. For example, in the human microbiome project (HMP) dataset, the repeated observations across time points (level 1) are nested within body sites (level 2), which are further nested within subjects (level 3). Therefore, there is a great need for the development of specialized and sophisticated statistical tests. In this paper, we propose multilevel zero-inflated negative-binomial models for association analysis in microbiome surveys. We develop a variational approximation method for maximum likelihood estimation and inference. It uses optimization, rather than sampling, to approximate the log-likelihood and compute parameter estimates, provides a robust estimate of the covariance of parameter estimates and constructs a Wald-type test statistic for association testing. We evaluate and demonstrate the performance of our method using extensive simulation studies and an application to the HMP dataset. We have developed an R package MZINBVA to implement the proposed method, which is available from the GitHub repository https://github.com/liudoubletian/MZINBVA.
               
Click one of the above tabs to view related content.