In recent years, the technological advances for capturing genetic variation in large populations led to the identification of large numbers of putative or disease-causing variants. However, their mechanistic understanding is… Click to show full abstract
In recent years, the technological advances for capturing genetic variation in large populations led to the identification of large numbers of putative or disease-causing variants. However, their mechanistic understanding is lagging far behind and has posed new challenges regarding their relevance for disease phenotypes, particularly for common complex disorders. In this study, we propose a systematic pipeline to infer biological meaning from genetic variants, namely rare Copy Number Variants (CNVs). The pipeline consists of three modules that seek to 1) improve genetic data quality by excluding low confidence CNVs, 2) identify disrupted biological processes, and 3) aggregate similar enriched biological processes terms using semantic similarity. The proposed pipeline was applied to CNVs from individuals diagnosed with Autism Spectrum Disorder (ASD). We found that rare CNVs disrupting brain expressed genes dysregulated a wide range of biological processes, such as nervous system development and protein polyubiquitination. The disrupted biological processes identified in ASD patients were in accordance with previous findings. This coherence with literature indicates the feasibility of the proposed pipeline in interpreting the biological role of genetic variants in complex disease development. The suggested pipeline is easily adjustable at each step and its independence from any specific dataset and software makes it an effective tool in analyzing existing genetic resources. The FunVar pipeline is available at https://github.com/lasigeBioTM/FunVar and includes pre and post processing steps to effectively interpret biological mechanisms of putative disease causing genetic variants.
               
Click one of the above tabs to view related content.