MOTIVATION Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples… Click to show full abstract
MOTIVATION Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human. Identification of novel microbial species and quantification of their distributional variations among different samples that are sequenced using next-generation-sequencing technology hold the key to the success of most metagenomic studies. To achieve these goals, we propose a simple yet powerful metagenomic binning method, MetaBMF. The method does not require prior knowledge of reference genomes and produces highly accurate results, even at a strain level. Thus, it can be broadly used to identify disease-related microbial organisms that are not well-studied. RESULTS Mathematically, we count the number of mapped reads on each assembled genomic fragment cross different samples as our input matrix and propose a scalable stratified angle regression algorithm to factorize this count matrix into a product of a binary matrix and a nonnegative matrix. The binary matrix can be used to separate microbial species and the nonnegative matrix quantifies the species distributions in different samples. In simulation and empirical studies, we demonstrate that MetaBMF has high binning accuracy. It can not only bin DNA fragments accurately at a species level but also at a strain level. As shown in our example, we can accurately identify the Shiga-toxigenic E. coli O104:H4 strain which led to the 2011 German E.coli outbreak. Our efforts in these areas should lead to (1) fundamental advances in metagenomic binning (2) development and refinement of technology for the rapid identification and quantification of microbial distributions (3) finding of potential probiotics or reliable pathogenic bacterial strains. AVAILABILITY The software is available at https://github.com/didi10384/MetaBMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
               
Click one of the above tabs to view related content.