LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

PO-348 The other 98%: making sense of non-coding variation

Photo by lucabravo from unsplash

Introduction Despite the ever-increasing availability of whole genome sequencing datasets, our ability to understand functional impact of genetic variation remains largely restricted to the coding portion of the genome. Although… Click to show full abstract

Introduction Despite the ever-increasing availability of whole genome sequencing datasets, our ability to understand functional impact of genetic variation remains largely restricted to the coding portion of the genome. Although a selection of methods exist to estimate variant damage potential, these methods are heavily biassed towards coding variation and have limited ability to detect potentially functional variants in non-coding regions. Furthermore, these methods either ignore tissue context of the variant or combine epigenetic annotations from multiple tissues, often dominated by blood cell lines, and are therefore poorly suited for predicting functional impact of variants in regulatory elements that act in a tissue-specific manner. Material and methods In order to address these limitations, we have developed a machine learning classifier aimed at prioritising non-coding variants while taking into account relevant tissue context. We defined ’functional impact’ as the propensity of a variant to cause allele-specific chromatin accessibility. DNase Hypersensitive Site (DHS) variants found to display preferential accessibility, as well as a matched set of negative variants, were annotated with a range of features including a set of core epigenetic marks from the ROADMAP Epigenomics project matched to the tissue of origin of the positive variant set. We then combined several machine learning algorithms to train an ensemble classifier. On a balanced test set (equal number of positive and negative variants) our model achieved area under curve (AUC) of ~75%. On a realistic dataset, where the negative variants outnumbered the positive ones 100:1, we achieved AUC of 90%. Results and discussions To demonstrate the importance of tissue context when estimating functional impact of non-coding variants, we compared the classifier’s performance on the same set of variants annotated with epigenetic information from either a closely matched or an unrelated tissue. We observed a notable drop in performance when using epigenetic context from a mismatched tissue. We show that our method drastically outperforms existing damage estimation tools in its ability to predict allele-specific chromatin accessibility. Finally, we demonstrate the utility of our method by successfully prioritising experimentally validated regulatory variants among the large number of variants within the same linkage disequilibrium block. Conclusion The above method will aid in interpretation of whole genome sequencing datasets generated in cancer and rare disease studies.

Keywords: non coding; functional impact; coding variation; tissue

Journal Title: ESMO Open
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.