LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning.

Photo from wikipedia

A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition… Click to show full abstract

A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition processes often preclude the assignment of a specific protein conformation to an individual ligand-bound pose. Here, we demonstrate that a computational framework coined as RF-TICA-MD, which integrates an ensemble decision-tree-based Random Forest (RF) machine learning (ML) technique with an unsupervised dimension reduction approach time-structured independent component analysis (TICA), provides an efficient and unambiguous solution toward resolving protein conformational plasticity and the substrate binding process. In particular, we consider multimicrosecond-long molecular dynamics (MD) simulation trajectories of a ligand recognition process in solvent-inaccessible cavities of archetypal proteins T4 lysozyme and cytochrome P450cam. We show that in a scenario in which clear correspondence between protein conformation and binding-competent macrostates could not be obtained via an unsupervised dimension reduction approach, an a priori decision-tree-based supervised classification of the simulated recognition trajectories via RF would help characterize key amino acid residue pairs of the protein that are deemed sensitive for ligand binding. A subsequent unsupervised dimensional reduction of the selected residue pairs via TICA would then delineate a conformational landscape of protein which is able to demarcate ligand-bound poses from unbound ones. The proposed RF-TICA-MD approach is shown to be data agnostic and found to be robust when using other ML-based classification methods such as XGBoost. As a promising spinoff of the protocol, the framework is found to be capable of identifying distal protein locations which would be allosterically important for ligand binding and would characterize their roles in recognition pathways. A Python implementation of a proposed ML workflow is available in GitHub https://github.com/navjeet0211/rf-tica-md.

Keywords: recognition; conformational plasticity; machine learning; protein conformational

Journal Title: Journal of chemical theory and computation
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.