LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

MetaBMF: a scalable binning algorithm for large-scale reference-free metagenomic studies

Photo by testalizeme from unsplash

MOTIVATION Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples… Click to show full abstract

MOTIVATION Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples that are sequenced using next-generation-sequencing technology hold the key to the success of most metagenomic studies. To achieve these goals, we propose a simple yet powerful metagenomic binning method, MetaBMF. The method does not require prior knowledge of reference genomes and produces highly accurate results, even at a strain level. Thus, it can be broadly used to identify disease-related microbial organisms that are not well-studied. RESULTS Mathematically, we count the number of mapped reads on each assembled genomic fragment cross different samples as our input matrix and propose a scalable stratified angle regression algorithm to factorize this count matrix into a product of a binary matrix and a nonnegative matrix. The binary matrix can be used to separate microbial species and the nonnegative matrix quantifies the species distributions in different samples. In simulation and empirical studies, we demonstrate that MetaBMF has high binning accuracy. It can not only bin DNA fragments accurately at a species level but also at a strain level. As shown in our example, we can accurately identify the Shiga-toxigenic E. coli O104:H4 strain which led to the 2011 German E.coli outbreak. Our efforts in these areas should lead to (1) fundamental advances in metagenomic binning (2) development and refinement of technology for the rapid identification and quantification of microbial distributions (3) finding of potential probiotics or reliable pathogenic bacterial strains. AVAILABILITY The software is available at https://github.com/didi10384/MetaBMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: metabmf scalable; different samples; matrix; binning; reference; metagenomic studies

Journal Title: Bioinformatics
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.