The availability of large chemical libraries containing hundreds of millions to billions of diverse drug‐like molecules combined with an almost unlimited amount of compute power to achieve scientific calculations has… Click to show full abstract
The availability of large chemical libraries containing hundreds of millions to billions of diverse drug‐like molecules combined with an almost unlimited amount of compute power to achieve scientific calculations has led investors and researchers to have a renewed interest in virtual screening (VS) methods to identify biologically active compounds. The number of in silico screening tools and software which employ the knowledge of the protein target or known bioactive ligands is increasing at a rapid pace, creating a crowded computational landscape where it has become difficult to assess the real advantages and disadvantages in terms of accuracy and efficiency of each individual VS technology. In the current work, we evaluate the performance of several state‐of‐the‐art commercial software for 3D ligand‐based VS against well‐known 2D methods using an internally curated benchmarking data set. Our results show that the best individual methods can differ significantly based on the data set, and that combining them using data fusion techniques results in improved enrichment in the top 1 % of retrieved hits. Although 2D methods alone can already provide a significant enrichment in the number of predicted active compounds, the combination of data‐fused 2D results with just one out of the best 3D methods (ROCS, FLAP or Blaze) further improves early enrichment and the likelihood of identifying additional chemotypes.
               
Click one of the above tabs to view related content.