"MarDRe: efficient MapReduce‐based removal of duplicate DNA reads in the cloud"

Summary: This article presents MarDRe, a de novo cloud‐ready duplicate and near‐duplicate removal tool that can process single‐ and paired‐end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud‐based infrastructures. Written in Java to maximize cross‐platform compatibility, MarDRe is built upon the open‐source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16‐node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state‐of‐the‐art tool. Availability and implementation: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es. Contact: [email protected]

Keywords: duplicate; removal duplicate; efficient mapreduce; based removal; mapreduce based; mardre efficient

Journal Title: Bioinformatics
Year Published: 2017

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended