A DNA motif is a sequence pattern shared by the DNA sequence segments that bind to a specific protein. Discovering motifs in a given DNA sequence dataset plays a vital… Click to show full abstract
A DNA motif is a sequence pattern shared by the DNA sequence segments that bind to a specific protein. Discovering motifs in a given DNA sequence dataset plays a vital role in studying gene expression regulation. As an important attribute of the DNA motif, the motif length directly affects the quality of the discovered motifs. How to determine the motif length more accurately remains a difficult challenge to be solved. We propose a new motif length prediction scheme named MotifLen by using supervised machine learning. First, a method of constructing sample data for predicting the motif length is proposed. Secondly, a deep learning model for motif length prediction is constructed based on the convolutional neural network. Then, the methods of applying the proposed prediction model based on a motif found by an existing motif discovery algorithm are given. The experimental results show that i) the prediction accuracy of MotifLen is more than 90% on the validation set and is significantly higher than that of the compared methods on real datasets, ii) MotifLen can successfully optimize the motifs found by the existing motif discovery algorithms, and iii) it can effectively improve the time performance of some existing motif discovery algorithms.
               
Click one of the above tabs to view related content.