Herein, we propose a de novo direct inverse quantitative structure-property relationship/quantitative structure-activity relationship (QSPR/QSAR) analysis method, based on the chemical variational autoencoder (VAE) and Gaussian mixture regression (GMR) models, to… Click to show full abstract
Herein, we propose a de novo direct inverse quantitative structure-property relationship/quantitative structure-activity relationship (QSPR/QSAR) analysis method, based on the chemical variational autoencoder (VAE) and Gaussian mixture regression (GMR) models, to generate molecules with the desired target variables of interest for properties and activities (y). A data set of molecules was analyzed, and an encoder was used to transform the simplified molecular input line entry system (SMILES) strings to latent variables (x), while a decoder was used to transform x to SMILES strings. A chemical VAE model was used for analysis and a GMR model (between x and y) was constructed for direct inverse analysis. The target y values were input into the GMR model to directly predict the x values. Following this, the predicted x values were input into the decoder associated with the chemical VAE model and the SMILES string representations (or chemical structures of molecules) were obtained as the output, indicating that the proposed method could be used to selectively obtain the molecules that were characterized by the target y values. We confirmed that the proposed method can be used to generate molecules within the target y ranges even when the conventional chemical VAE model failed to generate the target molecules.
               
Click one of the above tabs to view related content.