N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. It plays a vital role in various biological processes such as protein folding, stability, immunogenicity of different proteins, cell… Click to show full abstract
N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. It plays a vital role in various biological processes such as protein folding, stability, immunogenicity of different proteins, cell signaling and protein targeting. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand N-glycosylation mechanism.We developed a random forest method, Nglyc, to predict N-glycosylation sites in eukaryotic protein sequences. The method used 315 features derived from a protein sequence and its sequence homologs. Training and testing was performed using a dataset containing 895 N-glycosylation sites and 853 non-glycosylated sites obtained from 846 PDB structures. Our method achieved a high test accuracy of 81.02 % using all the 315 features and 82.48 % using the top 200 features. Further, comparison of Nglyc with the other N-glycosylation prediction methods shows that Nglyc has a better performance both in terms of sensitivity and specificity. Importantly, we have tested the applicability of Nglyc using experimentally validated N- glycosylation sites in human and mouse genomes.Nglycsoftware is available for download at https://github.com/bioinformaticsML/Ngly.
               
Click one of the above tabs to view related content.