LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Kollector: transcript-informed, targeted de novo assembly of gene loci

Photo by alphabetania from unsplash

Motivation: Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time‐consuming task that requires a significant amount of computational resources and… Click to show full abstract

Motivation: Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time‐consuming task that requires a significant amount of computational resources and expertise. A targeted assembly approach to perform local assembly of sequences of interest remains a valuable option for some applications. This is especially true for gene‐centric assemblies, whose resulting sequence can be readily utilized for more focused biological research. Here we describe Kollector, an alignment‐free targeted assembly pipeline that uses thousands of transcript sequences concurrently to inform the localized assembly of corresponding gene loci. Kollector robustly reconstructs introns and novel sequences within these loci, and scales well to large genomes—properties that makes it especially useful for researchers working on non‐model eukaryotic organisms. Results: We demonstrate the performance of Kollector for assembling complete or near‐complete Caenorhabditis elegans and Homo sapiens gene loci from their respective, input transcripts. In a time‐ and memory‐efficient manner, the Kollector pipeline successfully reconstructs respectively 99% and 80% (compared to 86% and 73% with standard de novo assembly techniques) of C.elegans and H.sapiens transcript targets in their corresponding genomic space using whole genome shotgun sequencing reads. We also show that Kollector outperforms both established and recently released targeted assembly tools. Finally, we demonstrate three use cases for Kollector, including comparative and cancer genomics applications. Availability and Implementation: Kollector is implemented as a bash script, and is available at https://github.com/bcgsc/kollector Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords: novo assembly; kollector; gene; gene loci

Journal Title: Bioinformatics
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.