Research ArticleChemistry

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

See allHide authors and affiliations

Science Advances  07 Apr 2021:
Vol. 7, no. 15, eabe4166
DOI: 10.1126/sciadv.abe4166
  • Fig. 1 Overview.

    (A) Process that led to the discovery of the atom-mapping signal and ultimately to the development of RXNMapper. (B) Directly affected chemical reaction prediction tasks. (C) Importance of atom-mapping in affected downstream applications.

  • Fig. 2 Reaction map and examples.

    (A) Visualizing the results on the whole 49k Schneider test set with a focus on the mismatched atom-mappings (together with 1.5 k matches for context) using reaction tree maps (TMAPs) (41, 58). (B) Examples of atom-mappings generated by RXNMapper. Reactants and reagents were not separated in the inputs.

  • Fig. 3 Atom-mapping on complex reactions.

    Examples and results for commercially available tools from the complex reactions dataset by Jaworski et al. (16). (A) Bu3Al-promoted Claisen rearrangement (47, 48). (B) Palladium-catalyzed semipinacol rearrangement and direct arylation (49). (C) Grubbs-catalyzed ring rearrangement metathesis reaction (50). (D) Ugi reaction (51).

  • Fig. 4 Comparison with other tools.

    (A) Comparison of RXNMapper, Mappet (16), and the original Indigo mapping from the USPTO dataset (281 reactions). The error bars show the Wilson confidence interval (59). (B) Mapping speed comparison between RXNMapper and Indigo (17), which is orders of magnitude faster than Mappet (16). For Indigo of 500 ms, we set a timeout of 500 ms, after which the tool would return an incomplete mapping. We averaged the timing on the imbalanced reactions for Indigo without timeout on 20k reactions.

  • Table 1 Comparison of different atom-mapping tools.

    Comparing RXNMapper to Indigo (17) and Mappet (16).

    RXNMapperIndigo (17)Mappet (16)
    Average time
    (short)
    6.4 ms17.0 msSlower than
    Indigo
    Average time
    (strongly
    unbalanced)
    7.7 ms2400 msNot handled
    Quality on
    complex
    reactions
    HighLowHigh
    Quality on
    strongly
    unbalanced
    reactions
    HighLow
    Open-source
    code?
    YesYesNo
  • Table 2 Test datasets.

    Datasets used for the comparison with other tools.

    Number of
    reactions
    Average
    number of
    reactant
    atoms
    Average
    number of
    product atoms
    Test set
    Simple reactions
    (16)
    10027.127.1
    Typical reactions
    (16)
    10019.919.6
    Complex
    reactions (16)
    20125.724.8
    USPTO bond
    changes (16)
    28126.023.7
    Schneider 50k
    test (34)
    49,00043.326.1

Supplementary Materials

  • Supplementary Materials

    Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

    Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, Teodoro Laino

    Download Supplement

    The PDF file includes:

    • Detailed evaluation
    • Confidence score
    • Hyperparameters and model selection
    • Visualization of self-attention
    • Figs. S1 to S8
    • Tables S1 and S2
    • Legend for common patent reaction templates

    Other Supplementary Material for this manuscript includes the following:

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article