LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

To rarefy or not to rarefy: robustness and efficiency trade-offs of rarefying microbiome data.

Photo from wikipedia

MOTIVATION Microbiome datasets provide rich information about microbial communities. However, vast library size variations across samples present great challenges for proper statistical comparisons. To deal with these challenges, rarefaction is… Click to show full abstract

MOTIVATION Microbiome datasets provide rich information about microbial communities. However, vast library size variations across samples present great challenges for proper statistical comparisons. To deal with these challenges, rarefaction is often used in practice as a normalization technique, although there has been debate whether rarefaction should ever be used. Conventional wisdom and previous work suggested that rarefaction should never be used in practice, arguing that rarefying microbiome data is statistically inadmissible. These discussions, however, have been confined to particular parametric models and simulation studies. RESULTS We develop a semiparametric graphical model framework for grouped microbiome data and analyze in the context of differential abundance testing the statistical trade-offs of the rarefaction procedure, accounting for latent variations and measurement errors. Under the framework, it can be shown rarefaction guarantees that subsequent permutation tests properly control the Type I error. In addition, the loss in sensitivity from rarefaction is solely due to increased measurement error; if the underlying variation in microbial composition is large among samples, rarefaction might not hurt subsequent statistical inference much. We develop the rarefaction efficiency index (REI) as an indicator for efficiency loss and illustrate it with a data set on the effect of storage conditions for microbiome data. Simulation studies based on real data demonstrate that the impact of rarefaction on sensitivity is negligible when overdispersion is prominent, while low REI corresponds to scenarios in which rarefying might substantially lower the statistical power. Whether to rarefy or not ultimately depends on assumptions of the data generating process and characteristics of the data. AVAILABILITY Source codes are publicly available at https://github.com/jcyhong/rarefaction. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.

Keywords: microbiome data; rarefying microbiome; rarefy; efficiency; rarefaction; trade offs

Journal Title: Bioinformatics
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.