LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

MarDRe: efficient MapReduce‐based removal of duplicate DNA reads in the cloud

Photo by bermixstudio from unsplash

Summary: This article presents MarDRe, a de novo cloud‐ready duplicate and near‐duplicate removal tool that can process single‐ and paired‐end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely… Click to show full abstract

Summary: This article presents MarDRe, a de novo cloud‐ready duplicate and near‐duplicate removal tool that can process single‐ and paired‐end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud‐based infrastructures. Written in Java to maximize cross‐platform compatibility, MarDRe is built upon the open‐source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16‐node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state‐of‐the‐art tool. Availability and implementation: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es. Contact: [email protected]

Keywords: duplicate; removal duplicate; efficient mapreduce; based removal; mapreduce based; mardre efficient

Journal Title: Bioinformatics
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.