Simplifying inverse materials design problems for fixed lattices with alchemical chirality

See allHide authors and affiliations

Science Advances  19 May 2021:
Vol. 7, no. 21, eabf1173
DOI: 10.1126/sciadv.abf1173


Brute-force compute campaigns relying on demanding ab initio calculations routinely search for previously unknown materials in chemical compound space (CCS), the vast set of all conceivable stable combinations of elements and structural configurations. Here, we demonstrate that four-dimensional chirality arising from antisymmetry of alchemical perturbations dissects CCS and defines approximate ranks, which reduce its formal dimensionality and break down its combinatorial scaling. The resulting “alchemical” enantiomers have the same electronic energy up to the third order, independent of respective covalent bond topology, imposing relevant constraints on chemical bonding. Alchemical chirality deepens our understanding of CCS and enables the establishment of trends without empiricism for any materials with fixed lattices. We demonstrate the efficacy for three cases: (i) new rules for electronic energy contributions to chemical bonding; (ii) analysis of the electron density of BN-doped benzene; and (iii) ranking over 2000 and 4 million BN-doped naphthalene and picene derivatives, respectively.


The computational simulation of molecules and materials, performed to predict their physical, material, and chemical properties, has become a routine tool in the molecular and materials sciences. Current efforts geared towards computational materials and molecular design might, one day, enable the realization of the holy grail of automatized experimental design and discovery. Driven by the accelerating progress of computer hardware and statistical learning (artificial intelligence), the first seminal examples of integrating sophisticated software and robotics to perform experimental sequences and to establish rules and trends among properties and materials, as well as their synthesis, in realiter have recently been introduced (15). However, the lofty goal of “materials on demand” has still remained elusive, even when doing it just in silico.

The use of empirical trends to guide experimental design has had a long tradition in the chemical sciences. Popular examples include Mendeleev’s discovery of the periodic table, Hammett’s relationship, Pettifor’s numbering scheme, the Bell-Evans-Polanyi principle, Hammond’s postulate, or Pauling’s covalent bond postulate (6). Modern systematic attempts to establish and exploit such rules in terms of quantitative structure-property relationships have led to computationally advanced bio-, chem-, and materials-informatics methodologies (7). Unfortunately, these methods are typically inherently limited to certain applicability domains and do not scale because of their empirical nature (8). To rigorously explore the high-dimensional chemical compound space (CCS) (9), i.e., the combinatorially scaling number of all conceivable molecules or materials (usually defined by composition, constitution, and conformation), the quantum mechanics of electrons ought to be invoked.

It is thus not unexpected that ab initio–based materials design approaches have been at the forefront for more than 20 years (1014) and have played a major role in popularizing the use of efficient and accurate quantum methods, such as density functional theory (DFT) (1519). Sampling CCS from scratch, even when done within efficient optimization algorithms, is typically an encyclopedical endeavor by nature and ignores the many underlying relations among different properties and materials. Quantum machine learning models (20) statistically exploit these implicit correlations, hidden in the data, and have successfully accelerated CCS exploration campaigns (21). Machine learning efficiency and transferability demonstrably benefit greatly from explicitly enforcing known relationships (e.g., translational, rotational, or atom index invariances) directly in the model construction rather than having to learn them agnostically from data (22). Specific examples include explicitly imposing forces and curvatures in the loss function (2325), spatial symmetry relations (26, 27), or arbitrary differential relations (28). However, even for the most efficient and transferable statistical models, e.g., the atom-in-molecule fragment–based approach (29), the acquisition of training data in sufficient quantity and quality requires considerable up-front investments.

Here, we introduce the fundamental notion of a new symmetry relation in CCS, which is fully consistent with the ab initio view of matter (30) and effectively enables us to solve the inverse materials design problem in a nonempirical and highly efficient manner. Spatial symmetry considerations have been crucial for the unraveling of some of the most fundamental laws of nature and are heavily used in many fields. In ab initio calculations, for example, symmetry group theory arguments are common to reduce computational complexity and load. Symmetry constraints on compositional degrees of freedom would be highly desirable to establish general rules among distinct materials and properties and to generally improve our understanding of CCS (30).

In analogy to conventional spatial chirality, we here define “alchemical chirality” as a reflection plane in the space spanned by nuclear charges at fixed atomic positions as they enter the electronic Schrödinger equation. An illustrative comparison is given in Fig. 1 (A and B) for conventional enantiomers consisting of a tetravalent carbon atom with four different substitutions and for alchemical enantiomers consisting of doubly boron nitride (BN)–doped carbon in the diamond crystal structure. Figure 1C compares and relates this newly described alchemical reflection σA with the conventional spatial reflection σS for the same dopant pattern as in Fig. 1B for a single molecular skeleton. In this case, four subsequent reflections alternating between alchemical and spatial reflections return to the original molecule. While any spatial reflection leaves the molecule unchanged, an alchemical reflection affects the nuclear charges and therefore creates a different molecule as reflection image.

Fig. 1 Illustrations of alchemical chirality and spatial chirality.

(A) Enantioselective catalysis enables the synthesis of either conventional enantiomer related to its counterpart through spatial reflection symmetry. (B) Alchemical (not spatial) chirality at four sites in a diamond cubic lattice relates BN-doped alchemical enantiomers through alchemical reflection (boron, nitrogen, and carbon in red, blue, and gray, respectively). (C) Alchemical reflection plane σA for the six carbon nuclei (Z = 6) in benzene connecting 1,2,3,6-tetrahydro-1,2,3,6-diazadiborinine (THDADB) to 1,2,3,4-tetrahydro-1,4,2,3-diazadiborinine compared to the spatial reflection plane σS. Boron, nitrogen, carbon, and hydrogen in red, blue, gray, and white, respectively. (D) Illustration of approximate alchemical reflection plane σA (diagonal) in the space defined by the nuclear charges Z of two atoms I, J in three cyclic CBNC-containing reference molecules (empty symbols). Three reflections via virtual reference with fractional nuclear charges (orange triangle), real reference with two carbons (blue circle), and real reference with two nitrogens (red square) are indicated. The latter reflection does not yield alchemical enantiomers. (E) Chains of approximate alchemical enantiomers. The top left and top right molecules are connected via one reference molecule, while the bottom left and bottom right molecules are connected via another reference molecule. Both molecules on the right are identical, and therefore, the three distinct molecules form a chain of approximate alchemical enantiomers. Transmutating atoms and symmetry-equivalent sites are denoted with nuclear charges and identical arrows, respectively. Electronic energies are given in Hartrees.

Exchange of the dopant atom sequence in Fig. 1B from NBBN to BNNB is an alchemical reflection around pristine diamond: For each site, the nuclear charge difference to diamond gets inverted. Hence, in the space spanned by all nuclear charges of the system, pristine diamond corresponds to a reflection center. All reflections that leave the total sum of all nuclear charges unchanged are defined by a hyperplane (cf. Fig. 1D), which we refer to as the nuclear charge reference plane. In other words, treating the change from pristine diamond to the two doped variants as a perturbation of the system Hamiltonian, there is an antisymmetry relation between these alchemical perturbations.

No other spatial symmetry operation (rotation, reflection, or inversion) can interconvert the constitutional isomers in Fig. 1B, thereby necessitating a fourth dimension, namely, the nuclear charges. This amounts to a four-dimensional (4D) alchemical chirality. Note that chirality crucially depends on dimensionality, e.g., the letter L is chiral within two dimensions only. The chiral center of our chirality operation can be a compound in itself (diamond in Fig. 1B and benzene in Fig. 1C).

Alchemical enantiomers exist only if pairs of distinct atomic environments can be mapped onto each other under a symmetry operation, a consequence of the reflection in nuclear charge space that defines alchemical enantiomers. This implies that the total sum of all nuclear charges of alchemical enantiomers is identical (see Fig. 1D). If a compound has no spatial symmetry, then atom sites with similar electron density derivatives Znρ constitute these pairs just like strictly symmetry-equivalent atoms do. In any case, alchemical chirality requires a one-to-one correspondence of sites with opposite change in nuclear charge. At least two such pairs need to exist in a compound to obtain alchemical enantiomers, which are different chemical objects. For a single pair of symmetry-equivalent atoms, the alchemical reflection in nuclear charge space would trivially connect two spatially symmetric compounds, e.g. CO and OC.

In summary, exact alchemical enantiomers are defined as two spatially non-superimposable, alchemically coupled, and iso-electronic compounds with the same formal charge, where each transmutating atom is assigned to exactly one subset within each of which averaging of nuclear charges results in identical chemical environments. Note that alchemical enantiomers can differ in chemical compositions, and that any given alchemical enantiomer can have as many alchemical mirror images as different valid averaged reference compounds can be defined. An alchemical enantiomer and all its possible mirror images are degenerate in the electronic energy up to the third order. Approximate alchemical enantiomers differ from their exact analog in that the pair-wise averaging results in similar, rather than identical, atoms within each pair. Alchemical enantiomers have approximately the same electronic energy (see the “Materials and Methods” section below). This symmetry is then broken by (i) the nuclear-nuclear repulsion and (ii) geometry relaxation into distinct total energy minima. The nuclear repulsion will typically dominate, implying that alchemical enantiomers with spatially closer atoms of higher nuclear charge will exhibit higher total energies than their respective mirror images.

While the restriction to fixed atomic positions might seem severe at first, we highlight in the following the importance and relevance of classes of fixed configurational frameworks for materials design applications. More specifically, we discuss the implications of alchemical chirality for system classes that are low dimensional in their structural degrees of freedom and where the problem is dominated by the combinatorial scaling due to varying chemical composition.

Examples abound and include any variation of graphitic motifs (studied below) prevalent in nanotechnological applications, in inorganic materials such as MAX-phases, or in organic electronics, e.g., polycyclic aromatic hydrocarbon derivatives and other rigid scaffolds, such as metal-organic frameworks or perovskites. All these systems would be directly amenable to alchemical chirality–based estimates (ACEs), i.e., approximating the electronic energies for alchemical enantiomers as degenerate and adding nuclear repulsions to obtain relative energies. This way, ACEs enable the ranking and grouping of large subsets of materials. Within any of these classes, the number of possible materials increases combinatorially with the number of building blocks. For real-world applications, defects often need to be included in the quantum chemistry model, which further increases the chemical space under consideration. For particularly rigid frameworks and extended periodic materials, only a local and low-dimensional geometric response to a change of composition is typically observed. Hence, ACEs amount to a complementary means to rigorously sample compositional ensembles within a given framework class. As shown below, ACEs become an enabling tool for the systematic identification of structure property relations within given compound classes that further the understanding of the respective impact of the compositional degrees of freedom in materials design. Here, we exemplify this by identifying design rules from analysis of more than 400 million alchemical estimates of BN-doped derivatives in the picene framework.


Estimating bond energies

The external potential energy differences between the reference compound (defining the reflection plane such as carbon atoms in diamond in Fig. 1B) and either alchemical enantiomer are exactly mirrored in magnitude (opposite sign). Hence, while the corresponding electronic Hamiltonians are alchemical mirror images of each other, their respective solutions to the electronic Schrödinger equation are not necessarily equal. More specifically, for reflections around atoms with identical chemical environments, we show that the electronic energy of corresponding alchemical enantiomers must be degenerate up to the third order within alchemical perturbation DFT (APDFT) (31) (see the “Materials and Methods” section below), higher-order terms carrying different signs. Hence, approximate alchemical enantiomers are only approximately degenerate in their electronic energy.

It turns out, however, that this approximation is quite fair and that useful rules for chemical binding can be derived. For example, by enumerating all colored connected graphs that are subgraphs of hexagonal lattices, we obtain the following ACE of two-body interatomic bonding for the electronic energyEQRESR+0.5×(EQQESS)(1)where QRS correspond to three adjacent elements in the periodic table. Hence, this rule confirms the naive expectation that given two heteroatomic bonds—QR and SR—to the same reference atom R, that electronic contribution to bonding will be larger, which contains the heteroatom with the larger electronic homoatomic bond, i.e., QQ versus SS, respectively. We believe that this simple rule was hidden in the past because of the obfuscation caused by the contributing nuclear repulsion term.

While seemingly reminiscent of Pauling’s electronegativity-based bond energy estimate (6), EQS = 0.5 × (EQQ + ESS) + Δχ, we stress that this ACE relation is derived from pure symmetry and perturbation theory–based considerations that, for alchemical enantiomers, are exact up to the third order. Pauling, by contrast, merely postulated his Ansatz. Our rule relies on bonding information involving three distinct and adjacent chemical elements rather than just two, and maybe more importantly, it pertains to the electronic potential energy only, i.e., without the nuclear repulsion, which is trivial to add a posteriori. This rule is easily verified for the example of BC and CN bonds in tetrahydrodiazadiborine (THDADB) by simply comparing the sum of all bond energies in either alchemical enantiomer (resulting in 2EBC + ENN ≈ 2ENC + EBB).

Subtracting nuclear repulsion estimates from published DFT data for all single bonds between main-group elements from the second and third periods saturated by hydrogen (32), ACEs yield bond energies of elements QR, i.e., solely based on bonding information of elements SR, QQ, and SS for adjacent elements QRS in the same period. As shown in Fig. 2, predicting bond dissociation energies among elements R and Q or S (to the left or right in the period) generates remarkably accurate predictions for bonds among elements in either period with a mean absolute error of ∼10 kJ/mol, not far from the highly coveted “chemical accuracy” of ∼4 kJ/mol. A linear fit through all the diverse chemistries encountered yields a slope of 1.04, an offset of ∼7 kJ/mol, and a correlation coefficient of 0.981. Successive daisy chaining across groups in any given period represents a straightforward extension, which markedly reduces the number of reference bonds required for ACE—but only at the cost of reduced accuracy: The mean absolute error (MAE) increases to ∼22 kJ/mol, i.e., still better or on par with common generalized gradient–based approximations to the exchange correlation potential in DFT (33).

Fig. 2 Predicted (EACE) from ACEs versus true (Et) single-bond dissociation energies.

These are predicted according to the two-body rule for adjacent elements QRS using published DFT results for main-group elements in the second and third rows of the periodic table and assuming all bond distances to correspond to 2 Bohr. Predictions of bonding energies between QR and RS elements are denoted as LHS and RHS, respectively. Successive recursive application of the rule throughout either period is denoted by “rec”. Inset: Each field in the matrix corresponds to a bond of the elements in the row and column headers. Arrows show which values, when combined together, yield a predicted bond energy. This shows the symmetry in going forward (RHS) and backward (LHS) through the groups in the periodic table. Using only the initial off-diagonal matrix element and then using one prediction to obtain the next following the arrows would correspond to rec in the main figure.

Furthermore, ACE rules emerge when increasing the alchemical nuclear charge radius (ΔZ = ±1, ±2) with PQRST corresponding to five adjacent elements in the periodic table: EPR ≈ ETR + (EPP − ETT)/2, EPQ + EQT + 2ESR ≈ EPS + EST + 2EQR, and 2EPR + EQT + EST ≈ EPQ + EPS + 2ETR. These rules are identical for the 3D diamond lattice and the 2D hexagonal graphene structure. Note that other graph lattices could yield additional rules and that rules for interatomic 3- and n-body contributions to binding exist as well. For example, invoking ΔZ = ± 1 only, one finds the three-body rule that ESQS + 2ERSQ + ERQR ≈ EQSQ + 2ERQS + ERSR. A systematic enumeration of ACE rules for two- and three-body terms for graphene and diamond lattices is given in the Supplementary Materials.

Ranking molecules

To apply ACE to inverse materials design problems, we showcase applications in well-defined subregions of CCS and solve three specific and increasingly challenging design tasks. All three use cases address the combinatorial design problem of how to dope planar hexagonal lattices to an increasing extent (6, 10, and 22 carbon atoms, respectively). Hexagonal lattices are archetypical scaffolds, e.g., relevant in the design of graphene-inspired materials for nanotechnological devices (34), catalytic surfaces (3537), porous BN-doped–based nanofibers for battery materials (38), or 2D materials in general (39). Just doping with BN already results in a combinatorial explosion; already for the 77 smallest benzoid-like structures, the number of possible unique BN-doped derivatives exceeds 7 tera (40). Hence, we believe that it is warranted, and without loss of generality, to focus on constitutional isomers in rigid lattices for which thermal or geometrical distortions can be neglected. Note that relative offsets in total energies due to differences in stoichiometry are straightforward to estimate, and typically occur on different orders of magnitude, and that subsequent inclusion of configurational degrees of freedom within alchemical predictions is possible, as already demonstrated for small molecules (41, 42) and ionic, metallic, and semiconducting solids (4345). Hence, the focus of the following applications lies on breaking down the combinatorially scaling problem of energy predictions throughout colored chemical bond connectivity graphs.

As a first use case, we provide an in-depth but intuitive illustration of alchemical chirality with benzene, its six equivalent carbon atoms as the mirroring reflection plane, and the alchemical enantiomers corresponding to carborazine [C2H6B2N2, THDADB]. Because second-order APDFT—resulting in alchemical normal modes to define a complete basis in certain subspaces of CCS—has already been applied to benzene (46), the occurrence of six degenerate estimates among the 11 constitutional isomers of THDADB (C2H6B2N2) can now be readily explained, thanks to alchemical chirality: They represent three pairs of alchemical enantiomers. For the select enantiomer pair BNNBCC/NBBNCC, a molecular planar analog to the crystalline example (Fig. 1B) is given in Fig. 1C, contrasting the conventional spatial reflection plane σs with the alchemical reflection plane σA. While NBBNCC and BNNBCC are mutually achiral in the sense of conventional spatial 3D chirality, the alchemical chirality relation between the two becomes obvious with regular benzene corresponding to the reflection plane.

In Fig. 3, the corresponding perturbing potentials (exact mirror images of each other) for the two benzene enantiomers from Fig. 1C are shown, as well as the resulting electron density derivatives of benzene up to the fourth order. Corresponding figures for all other BN-doped benzene mutants are provided in the Supplementary Materials. As it becomes obvious already by visual inspection, the perturbed densities are nearly identical for even orders and antisymmetric for odd orders. This indicates that the perturbational series in APDFT (31) extrapolates to an approximately equal amount, leading to the near degeneracy of the electronic energies of the two alchemical enantiomers. To quantify this effect, Fig. 3 also shows the corresponding APDFT-based electronic energy estimates up to the fifth order (within perturbation theory the energy order precedes the wave function order), numerically demonstrating that the degeneracy is exact in the second order and starts to deviate by ∼0.01, 0.02, and 0.025 Ha for the third, fourth, and fifth orders, respectively. The fifth-order predictions deviate from the actual electronic energies by ∼0.03 and 0.02 Ha, respectively. Addition of the nuclear repulsion terms typically lifts the approximate degeneracy, resulting in quantitative ACEs and simple inequality rules for total energies. Hence, EBNNBCC > ENBBNCC because of the larger nuclear repulsion experienced when atoms with larger nuclear charges (nitrogens in this case) are closer in proximity to each other than atoms with lower nuclear charges (boron).

Fig. 3 Alchemical enantiomers in carborazine [C2H6B2N2, tetrahydrodiazadiborine (THDADB)] with the reference molecule benzene (C6H6).

Left-hand column: the perturbing potential ΔV acting on benzene. Remaining columns correspond to electron density derivatives at CCSD/def2-TZVP level. Positive/negative values are shown in red/blue, respectively. All density contour lines are set at the same percentiles for each plot to render their shape comparable at different magnitudes. The percentiles are chosen to emphasize extremal values. Next to each electron density derivative, the corresponding electronic energy estimate by APDFT is given in Hartree for energy order 1 up to 5. The actual electronic energies are −438.413 Ha (top) and −438.401 Ha (bottom).

Such accuracy, while not on par with explicitly correlated quantum chemistry calculations in large basis sets, is on a similar scale as generalized gradient–approximated DFT or semiempirical quantum chemistry methods—typically sufficiently accurate for computational materials design studies as demonstrated for the many examples of successful computational discovery of heterogeneous catalysts (47) or the materials project dataset (48). The following two use cases explore this question for ACE-based ranking in a more systematic and comprehensive fashion.

Because the equivalence of the electronic energies is transitive, having a second reference that connects one alchemical enantiomer with a third one allows us to build chains of alchemical enantiomers that must have similar electronic energies, consequently enabling a ranking of molecules within one such group solely by the magnitude of their respective nuclear repulsion. Figure 4A illustrates the chaining of exact (σA) and approximate (σA) alchemical enantiomers for BN-doped naphthalene where four pairs of alchemical enantiomers could form such a chain. For reflections along the exact alchemical symmetry axis σA, the energy difference is 1 or 8 mHa, respectively. For cases with approximate alchemical symmetries σA, this energy difference is notably larger (237 mHa) but typically still much smaller than the nuclear repulsion energy contributions.

Fig. 4 Ranking molecules via chains of alchemical enantiomers.

(A) Each alchemical symmetry plane is denoted by a gray line and the reference molecule, the chiral center. Electronic energies given in Hartrees. (B) All 2285 unique BN-doped naphthalene derivatives ranked by their total energy as obtained from quantum chemistry calculations compared to the ranking from ACEs. Histogram is colored by number of molecules in a given bin. Two representative molecules of same sum formula C4(BN)3. Inset shows the distribution of rank errors sorted in ascending order for different methods: the force field UFF, bond counting (BC), semiempirical PM6, ACE (this work), semiempirical xTB-GFN2 (xTB), and the DFT methods PBE and PBE0, both with density-fitted cc-pVDZ basis set. Side panels show bond-type frequencies.

Overall, exhaustive charge-neutral and isoelectronic BN doping in naphthalene leads to 2285 unique derivatives for which we have used ACE to establish energetic ranks. Having identified all alchemical enantiomers through exhaustive scanning within one stoichiometry [stoichiometries as a whole can be ranked with existing relations (4951)], we have ranked all possible molecules using ACE within groups of molecules approximately degenerate in electronic energy. These groups form the connected components of a graph where all molecules are nodes and only alchemical enantiomers are connected, thus allowing us to exploit the transitivity of the electronic energy degeneracy. Within these groups, the nuclear repulsion dominates the ranking. Among the groups, we rank on the basis of averaged bond-counting results for their bond energies, where bond energies have been obtained by fitting to the imposed energy degeneracies among all pairs.

Using Coupled cluster with singles and doubles (CCSD/cc-pVDZ) as a ground-truth Quantum Mechanics (QM) method for all naphthalene derivatives on preoptimized geometry (40), we have performed an exhaustive validation of alchemical ranking, resulting in a Spearman correlation coefficient of 0.9899. Dissecting CCS in all the various possible stoichiometries of BN-doped naphthalene, the scatterplot of ACE rank versus QM rank results in a very reasonable correlation, as shown in Fig. 4B. To place ACE-based ranking in perspective, the inset of Fig. 4B also reports corresponding sorted ranking errors (wrt. CCSD) when using PBE0 (5254), PBE (16), xTB (55), PM6 (56), bond counting, and UFF (57) with respective Spearman correlation coefficients of 0.9998, 0.9983, 0.9966, 0.9592, 0.9562, and 0.9021. In terms of computational cost invested to rank all naphthalene derivatives, ACE ranking is slightly more expensive than bond counting, UFF, or PM6. In terms of accuracy; however, it clearly outperforms all three—without any need for empirical knowledge or fitting to external data. It is worth noting that bond counting is the only method that has been reparametrized on this particular dataset, which explains its good performance compared to PM6. As also shown in Fig. 4B, ACE reproduces bond-type frequency trends as a function of rank. While the accuracy of ACE itself is not on par with more advanced semiempirical quantum chemistry (xTB), the sorted ranking error distribution suggests that alchemical ranking is closer to xTB than it is to PM6. However, in contrast to alchemical ranking, xTB relies on substantial empirical data for fitting. While PBE and PBE0 are obviously more accurate, their computational cost is also three orders of magnitude larger than xTB (∼40 s). Hence, we conclude that alchemical ranking is superior to bond counting and PM6 and could represent a viable and less empirical alternative to xTB if accuracy thresholds are less stringent and computational load is high.

As a third use case, we have used ACE to explore and deepen our understanding of the CCS spanned by the 413,887,189 unique k-fold BN-doped derivatives of picene (see Fig. 5, with k < 7 and k > 9). Dissecting its CCS first by all stoichiometries, we can use the trends to map out obvious structural features as a function of rank to detect useful descriptors for structure-property relationships. Roughly speaking, results in Fig. 5 suggest that the energy will typically decrease within any given stoichiometry as the number of CC, BN, NN, and BB bonds increases, increases, decreases, and decreases, respectively. Differences in B and N counts in any ring (a measure of compositional homogeneity per ring), hardly affects the energy except for the heavily BN-doped stoichiometries (11 and 10 BN pairs). The degree of clustering (as measured by root mean squared distances) varies wildly with little correlation for BB, BN, and NN, as long as only few sites have been doped. As the degree of BN doping increases, the strong stabilizing effect of BN bonds subdivides the constitutional isomers into groups of identical BN bond count within which boron clusters are stabilizing while nitrogen clusters are destabilizing. Because the BB bond effect is visible for any fixed number of BN bonds, we can conclude that the impact of BN bonds on stability is larger than the one of BB bonds. Following this concept of conditional order and with the data shown in Fig. 5 and the extended version thereof in the Supplementary Materials, we can identify the following stabilizing design patterns in decreasing order of strength: (i) add BN pairs, (ii) maximize number of CC bonds, (iii) substitute sites shared between rings, (iv) maximize number of BN bonds, (v) avoid N substitutions on rings sharing a larger amount of bonds with other rings, and (vi) balance BN substitutions in each ring. Note how ACE has given us access to a complete ranking without a single quantum chemistry calculation. While the individual rank might not be completely accurate, as shown in Fig. 4B, the emerging pattern yields relevant and novel structure property trends as a direct consequence of alchemical chirality.

Fig. 5 Trends among all 414 million BN-doped picenes of the select stoichiometries (top) ranked by their total energy on the basis of ACE.

Each row shows a structural feature for each stoichiometry as a function of the ACE rank and averaged over more than 200 bins. Those molecules that are the most or least stable for each stoichiometry are listed separately.

A common problem in materials design consists of identifying global optima. In Fig. 5, we also show the least and most stable BN-doped derivatives for each stoichiometry, as identified by ACE. As more and more carbon sites are BN doped, the sites interact more strongly, and patterns emerge that are in line with the aforementioned summary observations, e.g., the energetically unfavorable nature of nitrogen clusters. Thus, even if high accuracy is needed eventually, prefiltering with ACE can markedly accelerate the identification and discovery of candidate compound lists that are to be treated with higher level quantum methods subsequently.


We have introduced “alchemical” 4D chirality, resulting from a reflection in the nuclear charge space manifested by the external potentials in the electronic Hamiltonian. This symmetry relation is exact for the perturbing Hamiltonians of the corresponding enantiomers. The corresponding variational electronic energies are degenerate only up to the third order and reminiscent of parity violation (58), which also lifts the exact energy degeneracy between spatial enantiomers (albeit less by many orders of magnitude). Our numerical findings indicate that ACEs solely from symmetry considerations alone are sufficiently accurate to enable the exploration of substantial swaths of subdomains in CCS. From a practical point of view, both experiments and simulations have a resolution limit resulting from method uncertainty beyond which molecules or crystals are indistinguishable. Alchemical chirality offers a new way to find those symmetrically related enantiomers with practically identical energies from which only one needs to be considered in terms of measurements or calculations. In the current state, this could be applied to, e. g., surface adsorption problems such as surface catalysis or molecular sensing. Future work on bounds on approximate alchemical enantiomers for relaxed geometries might give access to weak interaction problems as they occur in molecular self-assembly.

Specifically, we have provided novel ACE-based bonding rules of chemical bonds and angles. Numerical evidence for single bonds of second- and third-row main-group elements even suggests that DFT level of accuracy can be reached with these rules. CCSD-based perturbation theory results for THDADB (and all other possible BN-doped benzene derivatives) have served the illustration of alchemical chirality, indicating near degeneracy for the electronic energies of alchemical enantiomers, deviating by roughly two orders of magnitude less than a covalent bond. Correspondingly, ACE can serve the energy ranking of more complex systems, as discussed for the two use cases of BN doping in naphthalene and in picene. Using CCSD/cc-pVDZ as a reference, the comparison to bond counting, semiempirical DFT, generalized gradient–approximated DFT, and hybrid DFT results for the more than 2000 naphthalene derivatives indicates that the alchemical chirality–based ranking outperforms bond counting and common semiempirical quantum chemistry (PM6), while approaching the performance of semiempirical DFT (xTB) in terms of fidelity—at negligible computational cost and without empiricism. For the BN doping of picene, an ACE of more than 400 million unique derivatives has enabled the establishment of structural trends and the identification of the least and most stable derivatives for each stoichiometry. We stress that for all the alchemical chirality–derived results presented, not a single quantum chemistry calculation was necessary.

Overall, our arguments and numerical results indicate that the concept of alchemical chirality represents a novel, fundamental, and useful symmetry relation in CCS. Its power to dissect, group, and rank throughout the CCS of constitutional isomers holds great promise to further progress toward the overall goal of virtual computational materials discovery. Future work will deal with current limitations, such as the necessity to perform alchemical changes only in close vicinity in nuclear charge space, to restrict changes to be isoelectronic and neutral, or to rely on scaffold lattices with fixed nuclear positions. It would also be interesting to study how alchemical chirality can be exploited using quantum machine learning models.


Previous methodological work tackling CCS from first principles through variable (alchemical) nuclear charges included 4D DFT (59), the use of thermodynamic integration (60), trends among vertical energies (50), entire potential energy surfaces (61), nuclear grand-canonical ensembles (62, 63), linear combinations of atomic potentials (64), and APDFT (31). Starting in 1996 with stability of solid solutions (65), multiple promising applications, based on quantum alchemical changes, have been published over recent years, including thermodynamic integrations (66), mixtures in metal clusters (67, 68), reactivity estimates (69), water adsorption on BN-doped graphene (70), BN doping in fullerenes (71), and protonation energy predictions (72, 73) However, apart from nearsightedness studies on chemical transferability (74), general rules rooted in the quantum mechanical framework of variable composition are mostly still lacking. Here, we describe alchemical chirality, defined by compositional reflection in the nuclear charge mirror plane of some reference system. Such a reflection defines alchemical enantiomers as distinct constitutional isomers with electronic energies being identical up to the third order. Alchemical chirality relates distinct molecules and materials in ways that, to the best of our knowledge, have not yet been discussed before.

Calculating the total potential energy of any compound U, most commonly obtained within the Born-Oppenheimer approximation and neglecting relativistic effects, represents the probably most crucial step in any atomistic simulation study. It consists of two contributions, the nuclear repulsion, which can be evaluated trivially, and the more complex electronic energy E, which, within the picture of DFT, sums up the electrons’ kinetic contributions as well as their interactions with each other and with the nuclear charges. Hence, E is key and typically represents the principal goal of most modern electronic structure theory developments, including improved DFT approximations. Furthermore, textbook discussions, such as the virial theorem in physical chemistry, deal with the discussion of the chemical bond in terms of the electronic energy.

The difference in electronic energy E between a reference reflection molecule, constituting a maximum in electronic energy, with electron density ρr and some “adjacent” isoelectronic alchemical enantiomer i, ΔE = Ei(r)] − Er(r)], can be obtained through thermodynamic integration over the coupling constant 0 ≤ λi ≤ 1, which linearly interpolates the nuclear charges between reflection molecule and alchemical enantiomer. According to Hellmann-Feynman, ΔE=+dr Δvri(r)01dλi ρ(r,λi), with Δvri(r) as the difference in external potential between reflection molecule and alchemical enantiomer (31, 75).

Approximating this difference by a Taylor series expansion (n=11n!+dr Δvri(r)λin1ρ(r)λ=0), subtracting the energy of the other alchemical enantiomer j (i.e., Δvri = − Δvrj) and rearranging results inΔEijsymn=11n!+dr Δvri(r)(n1ρr(r)λin1+n1ρr(r)λjn1)(2)which is zero for all orders n as long as λin1ρr=λjn1ρr. Equation 2 must be exactly zero up to the third order because (i) the zeroth-order term Er] cancels, (ii) the first-order Hellmann-Feynman term ∫drΔvriρr is zero for highly symmetric systems (as necessary to define an alchemical reflection plane) due to Δv and ρr being odd and even functions (46), and (iii) due to the second-order term canceling because ∂λρr = ΣIΔZIZIρr differs for i and j only by the sign of ΔZI. Consequently, knowing all the higher-order contributions for one alchemical enantiomer also yields all higher-order terms for the other enantiomer without any further calculation. In other words, if a molecule contains a set of disjoint pairs of atoms of the same elements, which share nearly the same chemical environments, then it can be seen as the nuclear charge reflection plane, or chiral center, of all corresponding alchemical enantiomers. For example, the benzene molecule is the chiral center for three pairs of alchemical enantiomers (for ΔZ = ± 1, i.e., BN doping), all discussed in the Supplementary Materials and one highlighted in Figs. 3 and 1. For isoelectronic charge-neutral mutations, alchemical enantiomers differ from their chiral center only in their nuclear charges such that the net nuclear charge difference of each atom pair is zero. We used nauty (76) for graph enumeration and the Coulomb Matrix (77) as implemented in qmlcode (78) as similarity measure. Reference calculations were performed with Molpro (79) and MRCC (80, 81), in part using basis set extrapolation (82).


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We thank S. Fias, G. v. Hahn, R. Ramakrishnan, A. Savin, and J. Schrier for discussions. Some calculations were performed at sciCORE ( scientific computing core facility at University of Basel. Funding: We acknowledge support by the Swiss National Science foundation (no. PP00P2_138932, 407540_167186 NFP 75 Big Data, 200021_175747, NCCR MARVEL) and by the European Research Council (ERC-CoG grant QML #772834). Author contributions: G.F.v.R. acquired data and wrote the new software used in the work. Both G.F.v.R. and O.A.v.L. conceived and planned the project, analyzed and interpreted the results, and wrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper, online (83), and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article