Neural networks and deep learning have been successfully applied to tackle problems in drug discovery with increasing accuracy over time. There are still many challenges and opportunities to improve molecular… Click to show full abstract
Neural networks and deep learning have been successfully applied to tackle problems in drug discovery with increasing accuracy over time. There are still many challenges and opportunities to improve molecular property predictions with satisfactory accuracy even further. Here, we proposed a deep-learning architecture model, namely Bidirectional long short-term memory with Channel and Spatial Attention network (BCSA), of which the training process is fully data-driven and end to end. It is based on data augmentation and SMILES tokenization technology without relying on auxiliary knowledge, such as complex spatial structure. In addition, our model takes the advantages of the long- and short-term memory network (LSTM) in sequence processing. The embedded channel and spatial attention modules in turn specifically identify the prime factors in the SMILES sequence for predicting properties. The model was further improved by Bayesian optimization. In this work, we demonstrate that the trained BSCA model is capable of predicting aqueous solubility. Furthermore, our proposed method shows noticeable superiorities and competitiveness in predicting oil–water partition coefficient, when compared with state-of-the-art graphs models, including graph convoluted network (GCN), message-passing neural network (MPNN), and AttentiveFP.
               
Click one of the above tabs to view related content.