"A novel normalization and differential abundance test framework for microbiome data"

MOTIVATION Microbial communities have been proved to have close relationship with many diseases. The identification of differentially abundant microbial species is clinically meaningful for finding disease-related pathogenic or probiotic bacteria. However, certain characteristics of microbiome data have hurdled the accuracy and effectiveness of differential abundance analysis. The abundances or counts of microbiome species are usually on different scales and exhibit zero-inflation and overdispersion. Normalization is a crucial step before the differential abundance test. However, existing normalization methods typically try to adjust counts on different scales to a common scale by constructing size factors with the assumption that count distributions across samples are equivalent up to a certain percentile. These methods often yield undesirable results when differentially abundant species are of low to medium abundance level. For differential abundance analysis, existing methods often use a single distribution to model the dispersion of species which lacks flexibility to catch a single species' distinctiveness. These methods tend to detect a lot of false positives and often lack of power when the effect size is small. RESULTS We develop a novel framework for differential abundance analysis on sparse high-dimensional marker gene microbiome data. Our methodology relies on a novel network-based normalization technique and a two stage zero-inflated mixture count regression model (RioNorm2). Our normalization method aims to find a group of relatively invariant microbiome species across samples and conditions in order to construct the size factor. Another contribution of the paper is that our testing approach can take under-sampling and over-dispersion into consideration by separating microbiome species into two groups and model them separately. Through comprehensive simulation studies, the performance of our method is consistently powerful and robust across different settings with different sample size, library size, and effect size. We also demonstrate the effectiveness of our novel framework using a published dataset of Metastatic Melanoma and find biological insights from the results. AVAILABILITY The R package "RioNorm2" can be installed from Github at https://github.com/yuanjing-ma/RioNorm2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: framework; abundance; size; microbiome data; differential abundance

Journal Title: Bioinformatics
Year Published: 2020

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended