"Identifying N7-methylguanosine sites by integrating multiple features."

Recent studies reported that N7-methylguanosine (m7G) plays a vital role in gene expression regulation. As a consequence, determining the distribution of m7G is a crucial step towards further understanding its biological functions. Although biological experimental approaches are capable of accurately locating m7G sites, they are labor-intensive, costly, and time-consuming. Therefore, it is necessary to develop more effective and robust computational methods to replace, or at least complement current experimental methods. In this study, we developed a novel sequence-based computational tool to identify RNA m7G sites. In this model, 22 kinds of dinucleotide physicochemical (PC) properties were employed to encode the RNA sequence. Three types of descriptors, including auto-covariance, cross-covariance, and discrete wavelet transform were adopted to extract effective features from the PC matrix. The least absolute shrinkage and selection operator (LASSO) algorithm was utilized to reduce the influence of irrelevant or redundant features. Finally, these selected features were fed into a support vector machine (SVM) for distinguishing m7G from non-m7G sites. The proposed method significantly outperforms existing predictors across all evaluation metrics. It indicates that the approach is effective in identifying RNA m7G sites.

Keywords: m7g; identifying methylguanosine; integrating multiple; methylguanosine sites; sites integrating; m7g sites

Journal Title: Biopolymers
Year Published: 2021

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended