Protein sequencing has rapidly changed the landscape of healthcare and life science by accelerating the growth of diagnostics and personalized medicines for a variety of fatal diseases. Next-generation nanopore/nanoslit sequencing… Click to show full abstract
Protein sequencing has rapidly changed the landscape of healthcare and life science by accelerating the growth of diagnostics and personalized medicines for a variety of fatal diseases. Next-generation nanopore/nanoslit sequencing is promising to achieve single-molecule resolution with chromosome-size-long readability. However, due to inherent complexity, high-throughput sequencing of all 20 amino acids demands different approaches. Aiming to accelerate the detection of amino acids, a general machine learning (ML) method has been developed for quick and accurate prediction of the transmission function for amino acid sequencing. Among the utilized ML models, the XGBoost regression model is found to be the most effective algorithm for fast prediction of the transmission function with a very low test root-mean-square error (RMSE ∼0.05). In addition, using the random forest ML classification technique, we are able to classify the neutral amino acids with a prediction accuracy of 100%. Therefore, our approach is an initiative for the prediction of the transmission function through ML and can provide a platform for the quick identification of amino acids with high accuracy.
               
Click one of the above tabs to view related content.