Viral sequence integrations in the human genome have been implicated in various human diseases. Viral integrations remain among the most challenging-to-detect structural changes of the human genome. No studies have… Click to show full abstract
Viral sequence integrations in the human genome have been implicated in various human diseases. Viral integrations remain among the most challenging-to-detect structural changes of the human genome. No studies have systematically analyzed how molecular and bioinformatics factors affect the power (sensitivity) to detect viral integrations using high-throughput sequencing (HTS). We selected a wide-range of molecular and bioinformatics factors covering genome sequence characteristics, HTS features, and viral integration detection. We designed a fast simulation-based framework to model the process of detecting variable viral integration events in the human genome. We then examined the associations of selected factors with viral integration detection power. We identified six factors that significantly affected viral integration detection power (P < 2 × 10-16). The strongest factors associated with detection power included proportion of sample cells with clonal viral integrations (Pearson's ρ = 0.64), sequencing depth (ρ = 0.37), length of viral integration (ρ = 0.37), paired-end read insert size (ρ = 0.23), user-defined threshold (number of supporting reads) to claim successful identification of integrations (ρ = -0.19), and read length (when sequence volume was fixed) (ρ = -0.09). As the first tool of its kind, VIpower incorporates all these factors, which can be manipulated in concert with each other to optimize the detection power. This tool may be used to estimate viral integration detection power for various combinations of sequencing or analytic parameters. It may also be used to estimate the parameters required to achieve a specific power when designing new sequencing experiments.
               
Click one of the above tabs to view related content.