Background Genome-Wide Association Studies (GWAS) have enabled the mapping of scores of genes for psychiatric disorders. A commonly-used method to infer function for a set of associated genes is pathway… Click to show full abstract
Background Genome-Wide Association Studies (GWAS) have enabled the mapping of scores of genes for psychiatric disorders. A commonly-used method to infer function for a set of associated genes is pathway analysis. Here, genes are tested for enrichment in biological pathways, with the goal to find functional links between them. However, a drawback to pathway analysis is that many genes in the human genome remain poorly characterized and they cannot be assigned to a pathway. Here, we propose a solution that organizes genes into functional categories on the basis of motifs extracted from sequence information alone. Motifs are short, recurring sequence patterns presumed to have biological functions. DNA sequence motifs can give rise to transcription factor binding sites, while amino acid sequence motifs can specify functional protein domains. For example, a transcription factor binding motif could indicate that a gene is part of a specific regulatory pathway. A domain in a protein could indicate if it is membrane bound, or part of a signaling cascade. Hence sequence motifs and domains provide elementary clues to function, even in the absence of experimental data. Methods Here we present a pilot analysis to test for enrichment of Schizophrenia (SZ) GWAS findings among sets of genes encoding specific protein domains. We selected protein domains from the SMART (Simple Modular Architecture Research Tool) database in the subcategory of signaling domains. We exported UniProt identifiers for proteins possessing each domain, matched them to gene IDs and pruned for redundancy. We dropped domain-specific gene lists with fewer than ten genes. We then took all genes implicated by the 108 loci for SZ mapped by the PGC (2014). We tested for enrichment of GWAS hits in protein domain gene lists using Fisher's exact test. If significant, we obtained an empirical p-value using permutation to account for any dependency in the findings. Results We examined ten candidate domains in this trial study. Genes encoding several signaling domain proteins were not represented among the PGC findings at all, e.g. acidPPC, PI3Kc, PTB. Genes encoding some strong candidates, such as the PDZ domain, were enriched in PGC findings but not significantly (odds ratio=1.73, Fisher p=0.26). However, genes encoding ankyrin domain proteins were significantly enriched in PGC SZ findings (odds ratio=2.38, permutation p=0.04). Discussion PDZ domains represent strong candidates because they organize proteins into complexes at the Postsynaptic Density (PSD). Exome sequencing has linked rare mutations at PSD proteins to SZ. However, we detected no significant enrichment of genes encoding PDZ domains in SZ GWAS hits that involve common variants. Ankyrin domains link membrane proteins to the underlying cytoskeleton and many ankyrin proteins are expressed in neurons, so this preliminary association is plausible. We conclude that this pilot analysis yielded promising findings, so we are automating our analysis pipeline to cover all human protein domains and DNA sequence motifs. Our goal is to systematically test these for enrichment with psychiatric GWAS findings. Motif/domain enrichment analysis could complement existing pathway analysis methods to discern function among sets of risk genes.
               
Click one of the above tabs to view related content.