Motivation Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of… Click to show full abstract
Motivation Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of motif finding algorithms with the function to search for motifs with modified bases. In this study, we expend on our previous motif finding pipeline Epigram to provide systematic de novo motif discovery and performance evaluation on methylated DNA motifs. Results mEpigram outperforms both MEME (Bailey et al. 2006) and DREME (Bailey 2011) on finding modified motifs in simulated data that mimics various motif enrichment scenarios. Furthermore we were able to identified methylated motifs in Arabidopsis DNA affinity purification sequencing (DAP-seq) data that were previously demonstrated to contain such motifs (O'Malley et al. 2016). When applied to TF ChIP-seq and DNA methylome data in H1 and GM12878, our method successfully identified novel methylated motifs that can be recognized by the TFs Ășor their co-factors. We also observed spacing constraint between the canonical motif of the TF and the newly discovered methylated motifs, which suggests operative recognition of these cis-elements by collaborative proteins. Availability The mEpigram program is available at https://github.com/Wang-lab-UCSD/mEpigram. Supplementary information More analysis results and supplementary data are available in a companion document and online.
               
Click one of the above tabs to view related content.