Summary: This article presents MarDRe, a de novo cloud‐ready duplicate and near‐duplicate removal tool that can process single‐ and paired‐end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely… Click to show full abstract
Summary: This article presents MarDRe, a de novo cloud‐ready duplicate and near‐duplicate removal tool that can process single‐ and paired‐end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud‐based infrastructures. Written in Java to maximize cross‐platform compatibility, MarDRe is built upon the open‐source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16‐node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state‐of‐the‐art tool. Availability and implementation: Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es. Contact: [email protected]
               
Click one of the above tabs to view related content.