Key messageWe curated a reliable dataset of m6A sites in Arabidopsis thaliana, built competitive models for predicting m6A sites, extracted predominant rules from the prediction models and analyzed the most… Click to show full abstract
Key messageWe curated a reliable dataset of m6A sites in Arabidopsis thaliana, built competitive models for predicting m6A sites, extracted predominant rules from the prediction models and analyzed the most important features.AbstractIn biological RNA, approximately 150 chemical modifications have been discovered, of which N6-methyladenine (m6A) is the most prevalent and abundant. This modification plays an essential role in a myriad of biological mechanisms and regulates RNA localization, nuclear export, translation, stability, alternative splicing, and other processes. However, m6A-seq and other wet-lab techniques do not easily facilitate accurate and complete determination of m6A sites across the transcriptome. Therefore, the use of computational methods to establish accurate models for predicting m6A sites is essential. In this work, we manually curated a reliable dataset of m6A sites and non-m6A sites and developed a new tool called RFAthM6A for predicting m6A sites in Arabidopsis thaliana. Briefly, RFAthM6A consists of four independent models named RFPSNSP, RFPSDSP, RFKSNPF and RFKNF and strict benchmarks show that the AUC values of the four models reached 0.894, 0.914, 0.920 and 0.926, respectively in a fivefold cross validation and the prediction performance of RFPSDSP, RFKSNPF and RFKNF exceeded that of three previously reported models (AthMethPre, M6ATH and RAM-NPPS). Linear combination of the prediction scores of RFPSDSP, RFKSNPF and RFKNF improved the prediction performance. We also extracted several predominant rules that underlie the m6A site identification from the trained models. Furthermore, the most important features of the predictors for the m6A site identification were also analyzed in depth. To facilitate use of our proposed models by interested researchers, all the source codes and datasets are publicly deposited at https://github.com/nongdaxiaofeng/RFAthM6A.
               
Click one of the above tabs to view related content.