Inverse design of porous materials using artificial neural networks

See allHide authors and affiliations

Science Advances  03 Jan 2020:
Vol. 6, no. 1, eaax9324
DOI: 10.1126/sciadv.aax9324


Generating optimal nanomaterials using artificial neural networks can potentially lead to a notable revolution in future materials design. Although progress has been made in creating small and simple molecules, complex materials such as crystalline porous materials have yet to be generated using any of the neural networks. Here, we have implemented a generative adversarial network that uses a training set of 31,713 known zeolites to produce 121 crystalline porous materials. Our neural network takes in inputs in the form of energy and material dimensions, and we show that zeolites with a user-desired range of 4 kJ/mol methane heat of adsorption can be reliably produced using our neural network. The fine-tuning of user-desired capability can potentially accelerate materials development as it demonstrates a successful case of inverse design of porous materials.


The quest to discover new materials using artificial intelligence has spawned a great deal of research in the past few years. Subsequently, significant progress has been made in using various artificial neural networks (ANNs) to generate undiscovered molecules and materials (13). Unfortunately, no one has yet successfully used ANNs to create novel crystalline materials as machine learning has only been applied to predict material properties such as compositions, bandgap energy, formation energy, and gas adsorption uptakes (410). Among crystalline materials, porous materials contain dense arrangements of microscopic pores that lead to high surface area and pore volume, and as such are viewed as an important class of materials for many different energy- and environmental related applications (11, 12). It is our opinion that these porous materials [e.g., zeolites, metal-organic frameworks (MOFs), and covalent organic frameworks (COFs)] are especially challenging to generate using ANNs due to their relatively complex topologies compared with other crystalline materials. Moreover, various other factors (e.g., nonunique representation of the unit cell, complex chemistry, ambiguous assignment of the lattice parameters, and constraints of periodic boundary conditions) all contribute to the challenges in successfully incorporating ANNs for crystalline materials generation.

Here, we devised an ANN that can successfully generate crystalline porous materials. Specifically, we have targeted a case study problem of generating pure silica zeolite structures due to their relative simplicity (e.g., only two atom types in silicon and oxygen) and wealth of materials available [i.e., 234 International Zeolites Association (IZA) experimental structures and 331,163 Predicted Crystallography Open Database (PCOD) hypothetical zeolites (1315)] that can be used to train the neural network (see Materials and Methods). It must be noted that Deem and co-workers (13, 14) have already developed a Monte Carlo–based algorithm to generate pure silica zeolites. Moreover, many researchers have used both bottom-up (e.g., self-assembly based on building blocks) and top-down (e.g., topology-based construction of porous materials) approaches to generate porous materials such as MOFs and COFs (1618). However, most of these algorithms are deficient in the sense that finely tuned, user-desired properties cannot be targeted during the materials generation stage. As such, while there are few examples recently on using evolutionary algorithm to target material properties (1922), most of these conventional methods lead to brute-force generation of porous materials, which necessitates a computationally expensive screening procedure to identify the optimal materials for a given application. Moreover, most of the computational screening work conducted on a large database of materials has revealed that the majority of these generated materials have poor properties, leading to inefficient allocation of computational resources (16, 23, 24). By designing the neural network to represent the inputs in both the material and energy dimensions, our algorithm has a unique advantage in that inverse design of materials can be achieved with the ANN by biasing the energy dimension, which correlates with material properties.


Generative adversarial network for zeolites

Among many different choices in ANN, generative adversarial network (GAN) was used to produce crystalline porous materials due to its enhanced capability in generating realistic objects such as human faces (25, 26). GAN consists of both a discriminator and a generator, where the goal of the discriminator is to differentiate between the real and the fake data, whereas the generator acts to deceive the discriminator by progressively creating realistic fake objects. Overall, this setup leads to adversarial learning where realistic objects are generated as a by-product of the improvement in the learning process for both the discriminator and the generator. Recently, Arjovsky et al. (27) have suggested an improved version of GAN called Wasserstein GAN (WGAN). In WGAN, the discriminator is trained to estimate the Earth mover distance (EMD) between the data distribution and the generator distribution, and the generator is trained to minimize the EMD by generating realistic samples. In this case, the discriminator is renamed to be the “critic” as the critic does not discriminate anything. The critic has to be Lipchitz-1 continuous, and the constraint on the critic is attained by weight clipping in the original paper. Even more recently, Gulrajani et al. (28) modified WGAN with gradient penalty (WGAN-GP, which is the basis of our ANN), and consequently, constraint was implemented by adding gradient penalty on the loss function for the critic.

With the specific goal in mind of generating materials and energy shapes, we developed a new type of GAN named zeolite GAN (ZeoGAN). In previous work, we developed a neural network that had the capability to generate just the energy shapes, but materials generation was impossible (29). Here, as a test case study problem, the energy dimension was designated to be the methane potential energy due to its importance in various methane storage applications (30, 31) and facile creation of methane energy grids (ergo, fast generation of the training set for machine learning purposes) using classical molecular simulations (32). Although restricted to methane in this particular study, our ANN can, in practice, be easily generalized to other gas molecules (e.g., hydrogen, water, and carbon dioxide) given that we only would need to change the selection of the gas molecules within the classical molecular simulations.

The overall schematic of the ZeoGAN is shown in Fig. 1 (details in sections S1-1 and S1-2). The input to the neural network is divided into the materials and the energy grids, with the materials grid further subdivided into the silicon and the oxygen atom grids. The size of each of the three grids is set to be 32 × 32 × 32 points in fractional coordinates, with equal mesh size within a given zeolite unit cell. Fractional coordinates were used to keep the sizes of these grids the same for all of the zeolite materials. The number of grid points was kept small to reduce the memory cost as larger grids lead to a very slow learning process. The positions of the silicon and the oxygen atoms are represented by Gaussian functions, with the peak of the Gaussian corresponding to the position of the zeolite atoms. Since the oxygen and silicon grids were separated akin to RGB color channels, both of the Gaussian functions were assigned to the same amplitude of 1.0 and variance of 0.5. Both the methane potential energy grids and the material grids were generated using conventional molecular simulations (see Materials and Methods). The whole grids (silicon/oxygen/methane potential) are combined into a single tensor, and the tensor is used as an input of ZeoGAN. The ZeoGAN is trained to generate realistic tensors that resemble tensors calculated from the real zeolite.

Fig. 1 Overall schematics of the ZeoGAN.

Energy (green) in this case refers to methane potential energy, and material grids indicate silicon (red) and oxygen (yellow) atoms.

The goal of the generator in the ZeoGAN is to produce realistic zeolite materials and their corresponding energy shapes, and as such, several features were added to the ZeoGAN. Within the critic, periodic padding was implemented to impose a severe penalty for generating nonrealistic shapes (29). Without this feature, most of the materials produced by the ANN have unrealistic bonds formed across the periodic boundaries. To facilitate convergence, feature matching (33) was added to the ZeoGAN for both the materials and the energy shapes. Last, given that lattice constants were not explicitly included in both the materials or the energy grids, a lattice constant-generating network that produces accurate parameters based on the correlation between the lattice constants and materials/energy grids was added (29). The details on the training of ZeoGAN can be found in section S1-2.

Generation of pure silica zeolites

Overall, 31,173 methane accessible zeolites were used to train our neural network. The learning process of ZeoGAN shows the evolution of the material/energy shapes from their initial Gaussian noise distributions (Fig. 2A). Interpreting the distributions as similar to probability distributions, the EMD was selected as a metric to determine convergence in our training, with smaller EMD corresponding to more realistic zeolite shapes (section S1-2). Evolution of EMD from Fig. 2A shows converging behavior, and snapshots at different learning steps were illustrated to show the evolution in the shapes. Specifically, the material/energy shapes initially resemble a typical noise distribution (first picture in the inset of Fig. 2A), but as the learning progresses, it can be seen that the materials and the energy shapes occupy separate regions in the unit cell space, morphing into shapes that look like a typical zeolite.

Fig. 2 Learning curve of ZeoGAN and histogram of Si:O ratio values.

(A) EMD as a function of ZeoGAN iteration steps. The inset figure shows the evolution of a specific material (red/yellow) and energy (green) shapes. (B) Normalized frequency of Si:O ratio values for 1 million ZeoGAN outputs (top). Representative zeolite structures of the positions extracted from the zeolite shapes generated by the ZeoGAN for the outputs with different Si:O ratios (bottom).

Together, 1 million zeolite shapes (both material and energy) were generated from the ZeoGAN (step 1); from these shapes, a simple rule (section S1-4) was used to assign the positions of the oxygen and the silicon atoms (step 2). Next, the Si:O ratio was calculated for each of the outputs of step 2, and the results were tabulated as shown in Fig. 2B (noting that all four connected pure silica zeolites should, in theory, have Si:O = 0.5). To assess the quality of the outputs, representative ZeoGAN outputs are illustrated in Fig. 2B for Si:O = 0.3, 0.4, 0.5, 0.6, and 0.7. While several issues (e.g., wrong Si:O ratio, inaccurate bond lengths and bond angles) arise that prevent most of these outputs from being real zeolites, the majority of them share similar characteristics in terms of distributions of the silicon and the oxygen atoms compared with the real zeolites. Continuing from step 2, a subset of outputs from step 2 with 0.45 < Si:O < 0.55 were filtered (65% of the total zeolite shapes), and from this set, the ones with 75% or higher proper bond connectivity (where each Si has four oxygen bonds, and each oxygen has two silicon bonds) were kept, reducing the candidate set to 901. For bond connectivity, Si and O atoms were assumed to form bonds if their distance was less than 2.5 Å. From this reduced set, bond connectivity was repaired by adding/removing atoms via a random-based search (step 3), yielding 674 structures with Si:O ratio = 0.5 and 100% connectivity. From this set, only the structures with a small number of symmetrically unique T atoms (≤10) were kept as most of the structures with a high number of unique T atoms tend to be nonsymmetric, and as a result, reducing the total number to eight. Last, these fully connected structures were optimized using classical molecular simulations (section S1-4) with the same parameters used by Deem and co-workers to facilitate comparisons (step 4).

The overall operation explained above is summarized in Fig. 3 for three representative structures that passed through all the criteria. As can be seen from Fig. 3, all of the final relaxed structures resemble their respective initial zeolite shapes, providing evidence that our postprocessing operation does not significantly alter the essence of the zeolite shapes.

Fig. 3 Evolution of three zeolite shapes that successfully passed through the cleanup operation to yield Si:O = 0.5 and 100% bond connectivity.

For the eight resulting structures, the coordination sequence (34) was used as a simple metric to identify whether these structures exist in either the IZA or the PCOD database. As such, the coordination sequences for all of these zeolites were obtained, and it was found that seven were in either the IZA or the PCOD test set while one was an unknown zeolite, not found in any of the sets. None of the eight zeolites were in the original training set, indicating that ZeoGAN successfully created new zeolites that were not seen during the learning process. The cleaned-up zeolites and the corresponding zeolites from the IZA/PCOD database are shown in Fig. 4. One of the zeolites identified is ASV, which consists of an experimentally verified topology with 12, 6, and 4 membered rings and one-dimensional channels.

Fig. 4 Eight cleaned-up zeolites generated from ZeoGAN and their corresponding counterpart zeolites in IZA/PCOD.

Dashed blue line indicate the matching unit cell portion. Few of the generated zeolites are not three-dimensional structures because the training set of the zeolite database already contained these structures.

Inverse design of zeolites using ZeoGAN

Thus far, the zeolites generated from the ZeoGAN did not have any user-desired properties. To improve upon our design, the methane heat of adsorption was selected to test the user-desired capability since it is an important metric that is often targeted for materials design (35) by the experimentalists. Subsequently, the ZeoGAN loss function was altered such that zeolites with the heat of adsorption value between 18 and 22 kJ/mol were targeted for generation (section S1-3). As can be seen from the data for 1 million newly generated zeolite shapes in Fig. 5A, there is a sharp change in the methane heat of adsorption distribution compared with the training set case, indicating the proper functioning of the user-desired criterion. It is also worth mentioning that the distributions for methane KH and void fraction did not change much for the new loss function, indicating their uncorrelated nature with the selected methane heat of adsorption range.

Fig. 5 User-desired generation results.

(A) Distributions (methane KH, methane void fraction, and methane heat of adsorption) for 31,713 training set zeolites (pink), 1 million user-desired zeolite shapes (green), and 6 user-desired zeolites (yellow markers). (B) Two representative structures generated from the user-desired scheme that yielded methane heat of adsorption in the user-desired range of 18 to 22 kJ/mol.

Next, the same cleanup procedure described in Fig. 3 was conducted for the 1 million user-desired zeolite shapes, and the operation yielded six new zeolites (two from the training set, two from the PCOD/IZA set, and two outside of the dataset) and one zeolite (i.e., PCOD 8308701) that was also produced in the non–user-desired set. Out of these six zeolites, four were found to have methane heat of adsorption between 18 and 22 kJ/mol (with the two other having 17.1 and 23 kJ/mol, respectively), providing a reliable indication of successful inverse design of the zeolites. To the best of our knowledge, none of the existing experimental or computational methods in porous materials can a priori target a property with this specific range. Two representative cleaned-up zeolites are shown in Fig. 5B, and comparison of the coordination sequences reveals that they correspond to zeolite ACO and PCOD 8242361 structures (see section S2 for more detail).

Thus far, the zeolite candidates generated from ANN were restricted to having 10 unique T atoms, as most of the experimentally synthesized zeolites and all of the hypothetical zeolites have this restriction. However, it is conceivable that zeolites have more than this number as evidenced by zeolite UOZ (in IZA database), which has 25 unique T atoms. As such, upon removing the restriction on the number of unique T atoms, we observe a significant increase in the number of zeolites that were generated via the ANN (see Fig. 6). Specifically, adopting the traditional value where zeolites are deemed to be thermodynamically stable (30 kJ/mol Si) (14), the number of feasible zeolite structures was counted to be 121. One can extend the 30 kJ/mol Si limit by adopting the maximum framework energy of zeolite RWY (105.6 kJ/mol Si), and the total number of ANN-generated zeolites jumps to 1138 (although 105.6 kJ/mol Si might be too large of a limit for practical synthesis). Representative zeolites in Fig. 6 indicate that these materials look to be reasonable. Without the restriction on the unique T atoms, the total number of zeolites generated jumps from 14 to 1138, vastly enhancing the set and increasing the efficiency of generated zeolites from 0.0007 to 0.06% from the 2 million generated material/energy shapes. It is interesting to note that 1127 of these structures are not found in any existing database given that these are more complex zeolites, and as such, our ANN has successfully extended the number of new zeolites in the pure silica zeolite material space.

Fig. 6 Number of zeolites versus the number of unique T atoms.

Some representative zeolites are shown for different numbers of T atoms: 12 (left top), 28 (left bottom), 48 (right top), and 64 (right bottom).


Here, we successfully demonstrated an inverse design of zeolites using our in-house–developed ANN. This work can potentially pave the way to use ANNs to target user-desired property before material design/synthesis, and the tools can extend to any application in which the energy grid can map onto the properties relevant to the application itself (e.g., gas storage/separation, catalysis, sensors). Moreover, while being restricted to just the silicon and the oxygen atoms for simplicity purposes, the number of input channels in our ANN can be incremented to cover more complex crystalline materials such as MOFs and COFs, thereby enhancing the scope of the work and affecting the future design of diverse classes of materials.


Existing zeolite structures

To design zeolites that have sufficient methane accessibility, 217 IZA (four connected and nondisordered) and 331,163 PCOD zeolites were trimmed down to 99,362 structures that have a methane Henry coefficient (KH) value larger than 10−6 mol kg−1 Pa−1. In this methane-accessible zeolite set, 63,426 structures that have orthogonal cell angles (102 IZA and 63,324 PCOD) were used in this study. For the validation of nonexisting zeolite structures generation, the half of orthogonal zeolites was randomly selected as the input of ANNs.

Molecular simulations

The energy grids were constructed using classical molecular simulations. To keep the grid sizes consistent, the three-dimensional lattice vectors were divided by 32. For each of the 32 × 32 × 32 grid points, a methane molecule was placed to calculate the interaction energies between the methane and framework atoms. The interaction energy was computed using the following Lennard-Jones (LJ) 12-6 potential modelULJ(r)=4ε[(σr)12(σr)6](1)where r is the distance between methane and the framework atom, and the LJ force field parameters ε and σ were chosen from García-Pérez et al. (36). The Lorentz-Berthelot mixing rule was used to compute the interaction of different atom types.

By using the energy grids, adsorption properties like the Henry coefficient (KH) and the isosteric heat of adsorption (Qst) can be directly calculated at the learning stage of the ANNs. The unitless KH and the Qst of a methane molecule in the property grid can be calculated from following equationsunitless KH=1NiNeβEi(2)Qst=kBTiNEieβEiiNeβEi(3)where N is the number of grid points, kB is the Boltzmann constant, β is 1/kBT, and Ei is the energy value at the ith grid point. Here, all of these adsorption properties were calculated at 298 K.


Supplementary material for this article is available at

Supplementary Materials and Methods

Section S1. Computational methods

Section S1-1. Details of ZeoGAN

Section S1-2. Training of ZeoGAN

Section S1-3. Training for user-desired properties

Section S1-4. Zeolite cleanup

Section S2. Zeolite generation results

Section S2-1. Generated non–user-desired zeolite structures

Section S2-2. Generated user-desired zeolite structures

Fig. S1. Architecture of ZeoGAN.

Fig. S2. Training results after adding user-desired loss.

Fig. S3. Allowed next structure moves for the connectivity repairing algorithm.

Fig. S4. Distributions (methane KH, methane void fraction, and methane heat of adsorption) for non–user-desired generation.

Fig. S5. Summary of non–user-desired zeolite shapes (both materials and energy shapes) and corresponding cleaned-up structures (both materials and energy shapes).

Fig. S6. Summary of user-desired zeolite shapes (both materials and energy shapes) and corresponding cleaned-up structures (both materials and energy shapes).

Fig. S7. Matching zeolites in IZA/PCOD for the user-desired zeolites.

Table S1. ZeoGAN hyperparameters.

Table S2. Various methane properties of the eight non–user-desired zeolites.

Table S3. Various methane properties of the six user-desired zeolites.

References (3740)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: Funding: This work was supported, in part, by the Mid-Career Researcher Program (NRF-2017R1A2B4004029), and, in part, by the Energy Cloud R&D Program (NRF-2019M3F2A1072233) through the NRF (National Research Foundation of Korea), both funded by the Ministry of Science and ICT. This work was also supported by the BK21 Plus Program funded by the Ministry of Education (MOE, Korea). Author contributions: J.K. formulated the project. B.K. conducted the molecular and ANN simulations. S.L. designed the neural network. All authors contributed to the writing/revising of the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. ZeoGAN code can be downloaded from Additional data related to this paper may be requested from the authors.
View Abstract

Stay Connected to Science Advances

Navigate This Article