"Prediction of Drug–Target Interactions Based on Network Representation Learning and Ensemble Learning"

Identifying interactions between drugs and target proteins is a critical step in the drug development process, as it helps identify new targets for drugs and accelerate drug development. The number of known drug–protein interactions (positive samples) is much lower than that of the unknown ones (negative samples), which forms a class imbalance. Most previous methods only utilised part of the negative samples to train the prediction model, so most of the information on negative samples was neglected. Therefore, a new method must be developed to predict candidate drug–related proteins and fully utilise negative samples to improve prediction performance. We present a method based on non-negative matrix factorisation and gradient boosting decision tree (GBDT), named NGDTP, to identify the candidate drug–protein interactions. NGDTP integrates multiple kinds of protein similarities, drugs–proteins interactions, and multiple kinds of drugs similarities at different levels, including target proteins of drugs, drug-related diseases, and side effects of drugs. We propose a network representation learning method based on matrix factorisation to learn low-dimensional vector representations of drug and protein nodes. On the basis of these low-dimensional node representations, a GBDT-based prediction model was constructed and it obtains the association scores through establishing multiple decision trees for a drug–protein pairs. NGDTP is an ensemble learning model that fully utilises all the negative samples to effectively alleviate the problem of class imbalance. NGDTP achieves superior prediction performance when it is compared with several state-of-the-art methods. The experimental results indicate that NGDTP also retrieves more actual drug-protein interactions in the top part of prediction result, which drew significant attention from the biologists. In addition, case studies on 10 drugs further confirmed the ability of the NGDTP to identify potential candidate proteins for drugs.

Keywords: negative samples; drug; drug protein; prediction; representation learning; network representation

Journal Title: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Year Published: 2021

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended