"Development of New Methods Needs Proper Evaluation - Benchmarking Sets for Machine Learning Experiments for Class A GPCRs"

New computational approaches for virtual screening applications are constantly being developed. However, before a particular tool is used to search for new active compounds, its effectiveness in the type of task must be examined. In this study, we conducted a detailed analysis of various aspects of preparation of respective datasets for such an evaluation. We propose a protocol for fetching data from the ChEMBL database, examine various compounds representations in terms of the possible bias resulting from the way they are generated and define a new metric for comparing the structural similarity of compounds, which is in line with chemical intuition. The newly developed method is also used for the evaluation of various approaches for division of the dataset into training and test set parts, which are also examined in detail in terms of being the source of possible results bias. Finally, machine learning methods are applied in cross-validation studies of datasets constructed within the paper, constituting benchmarks for the assessment of computational methods developed for virtual screening tasks. Additionally, analogous datasets for class A G protein-coupled receptors (100 targets with the highest number of records) were prepared. It is available at http://gmum.net/benchmarks/, together with script enabling reproduction of all results available at https://github.com/lesniak43/ananas.

Keywords: development new; machine learning; new methods; class; evaluation

Journal Title: Journal of chemical information and modeling
Year Published: 2019

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended