Research ArticleCOMPUTATIONAL BIOLOGY

Deep reinforcement learning for de novo drug design

See allHide authors and affiliations

Science Advances  25 Jul 2018:
Vol. 4, no. 7, eaap7885
DOI: 10.1126/sciadv.aap7885
  • Fig. 1 The workflow of deep RL algorithm for generating new SMILES strings of compounds with the desired properties.

    (A) Training step of the generative Stack-RNN. (B) Generator step of the generative Stack-RNN. During training, the input token is a character in the currently processed SMILES string from the training set. The model outputs the probability vector pΘ(at|st − 1) of the next character given a prefix. Vector of parameters Θ is optimized by cross-entropy loss function minimization. In the generator regime, the input token is a previously generated character. Next, character at is sampled randomly from the distribution pΘ(at| st − 1). (C) General pipeline of RL system for novel compound generation. (D) Scheme of predictive model. This model takes a SMILES string as an input and provides one real number, which is an estimated property value, as an output. Parameters of the model are trained by l2-squared loss function minimization.

  • Fig. 2 A sample of molecules produced by the generative model.
  • Fig. 3 Performance of the generative model G, with and without stack-augmented memory.

    (A) Internal diversity of generated libraries. (B) Similarity of the generated libraries to the training data set from the ChEMBL database.

  • Fig. 4 Property distributions for RL-optimized versus baseline generator model.

    (A) Melting temperature. (B) JAK2 inhibition. (C) Partition coefficient. (D) Number of benzene rings. (E) Number of substituents.

  • Fig. 5 Evolution of generated structures as chemical substructure reward increases.

    (A) Reward proportional to the total number of small group substituents. (B) Reward proportional to the number of benzene rings.

  • Fig. 6 Examples of Stack-RNN cells with interpretable gate activations.

    Color coding corresponds to GRU cells with hyperbolic tangent tanh activation function, where dark blue corresponds to the activation function value of −1 and red describes the value of the activation function of 1; the numbers in the range between −1 and 1 are colored using a cool-warm color map.

  • Fig. 7 Clustering of generated molecules by t-SNE.

    Molecules are colored on the basis of the predicted properties by the predictive model P, with values shown by the color bar on the right. (A and C) Examples of the generated molecules randomly picked from matches with ZINC database and property values predicted by the predictive model P. (A) Partition coefficient, logP. (B) Melting temperature, Tm (°C); examples show generated molecules with lowest and highest predicted Tm. (C) JAK2 inhibition, predicted pIC50.

  • Table 1 Comparison of statistics for generated molecular data sets.
    PropertyValid molecules
    (%)
    Mean
    SAS
    Mean molar
    mass
    Mean value of target
    property
    Match with ZINC15
    database (%)
    Match with ChEMBL
    database (%)
    TmBaseline953.1435.41814.71.5
    Minimized313.1279.61374.61.6
    Maximized533.4413.22002.40.9
    Inhibition
    of JAK2
    Baseline953.1435.45.704.71.5
    Minimized603.85481.84.892.51.0
    Maximized453.7275.47.854.51.8
    LogPBaseline953.1435.43.634.71.5
    Range-
    optimized
    703.2369.72.585.81.8
    Number of
    benzene rings
    Baseline953.1435.40.594.71.5
    Maximized833.15496.02.415.51.6
    Number of
    substituents
    Baseline953.1435.43.84.71.5
    Maximized803.5471.77.933.10.7

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/7/eaap7885/DC1

    Fig. S1. Distribution of SAS for the full ChEMBL21 database (~1.5 million molecules), random subsample of 1M molecules from ZINC15, and generated data set of 1M molecules with baseline generator model G.

    Fig. S2. Reward functions.

    Fig. S3. Distributions of SAS for all RL experiments.

    Fig. S4. Distribution of residuals and predicted versus observed plots for predictive models.

    Fig. S5. Learning curve for generative model.

    Fig. S6. Distributions of SMILES’s string lengths.

  • Supplementary Materials

    This PDF file includes:

    • Fig. S1. Distribution of SAS for the full ChEMBL21 database (~1.5 million molecules), random subsample of 1M molecules from ZINC15, and generated data set of 1M molecules with baseline generator model G.
    • Fig. S2. Reward functions.
    • Fig. S3. Distributions of SAS for all RL experiments.
    • Fig. S4. Distribution of residuals and predicted versus observed plots for predictive models.
    • Fig. S5. Learning curve for generative model.
    • Fig. S6. Distributions of SMILES’s string lengths.

    Download PDF

    Files in this Data Supplement:

Navigate This Article