Novel sequence insertion (NSI) is an essential category of genome structural variations (SVs), which represents DNA segments absent from the reference genome assembly. It has important biological functions and strong… Click to show full abstract
Novel sequence insertion (NSI) is an essential category of genome structural variations (SVs), which represents DNA segments absent from the reference genome assembly. It has important biological functions and strong correlation with phenotypes and diseases. The rapid development of long-read sequencing technologies provides the opportunities to discover NSIs more sensitively, since the much longer reads are helpful for the assembly and location of the novel sequences. However, most of state-of-the-art long-read based SV detection approaches are in generic design to detect various kinds of SVs, and they are either not suited to detect NSIs or computationally expensive. Herein, we propose read clustering and assembly-based novel insertion detection tool (rCANID). It applies tailored chimerically aligned and unaligned read clustering and lightweight local assembly methods to reconstruct inserted sequences with low computational cost. Benchmarks on both simulated and real datasets demonstrate that rCANID can discover NSIs sensitively and efficiently, especially for NSI events with long inserted sequences which is still a non-trivial task for state-of-the-art approaches. With its good NSI detection ability, rCANID is suited to be integrated into computational pipelines to play important roles in many cutting-edge genomics studies.
               
Click one of the above tabs to view related content.