## Abstract

Data science has primarily focused on big data, but for many physics, chemistry, and engineering applications, data are often small, correlated and, thus, low dimensional, and sourced from both computations and experiments with various levels of noise. Typical statistics and machine learning methods do not work for these cases. Expert knowledge is essential, but a systematic framework for incorporating it into physics-based models under uncertainty is lacking. Here, we develop a mathematical and computational framework for probabilistic artificial intelligence (AI)–based predictive modeling combining data, expert knowledge, multiscale models, and information theory through uncertainty quantification and probabilistic graphical models (PGMs). We apply PGMs to chemistry specifically and develop predictive guarantees for PGMs generally. Our proposed framework, combining AI and uncertainty quantification, provides explainable results leading to correctable and, eventually, trustworthy models. The proposed framework is demonstrated on a microkinetic model of the oxygen reduction reaction.

## INTRODUCTION

Models in the chemical and physical sciences have led to both new understanding and new discoveries (*1*) including new materials (*2*, *3*). Physics-based models span orders of magnitude in length and time, ranging from quantum mechanics (*4*) to chemical plants (*5*), and naturally capture physics-based constraints (*6*–*8*). Combining models across scales, known as multiscale modeling (*9*), is necessary when chemical properties are determined at the quantum level, but most experiments and relevant applications exist at the macroscale, such as in heterogeneous catalysis. At the core of model development lies the question of accuracy of a physics-based model. Going beyond sensitivity analysis (*10*, *11*), there has been growing interest in quantifying uncertainty, resulting from correlations in parameters (*12*, *13*) along with other sources of error arising in predicting new materials (*14*). In addition to ensuring trustworthiness, error quantification can enable model correctability (*15*, *16*). Still, uncertainty is an afterthought in actual physics-based model development. Currently, a model is first built deterministically without systematically accounting for the effect of both modeling errors and lack, or sparseness, of data.

Modeling uncertain data has experienced tremendous advances in data science (*17*–*20*); however, the corresponding models are empirical, can fail without guarantee, and can violate conservation laws and constraints. Current approaches for handling data based on physical laws and chemical theory are, in a sense, not truly probabilistic and require correlations and causal relationships to be known a priori. With the increasing size of chemistry datasets, it is almost impossible to apply traditional methods of model development to systems with many sources of interacting error. Global sensitivity techniques, such as Sobol indices, attribute model variance to model variables and their interaction (hereafter called “parametric uncertainty”) (*21*). However, there are few methods that work beyond first-order interactions or quantify the importance of missing physics or submodels rather than parameters (hereafter called “model uncertainty”) (*22*). Methods that do exist model missing physics as stochastic noise that has no structure (*23*, *24*). Therefore, there is a need to develop methods that both attribute interaction error directly to model inputs and provide predictive guarantees and, by doing this, to make models correctable and eventually trustworthy for predictions and design tasks.

Here, we address these issues by incorporating error and uncertainty directly into the design of a model. First, we introduce the use of Bayesian networks (*20*), a class of probabilistic graphical models (PGMs) (*25*), common in probabilistic artificial intelligence (AI) (*26*), to integrate simultaneously and systematically physics- and chemistry-based models, data, and expert knowledge. This framework is termed C-PGM (chemistry-PGM). Second, we derive global uncertainty indices that quantify model uncertainty stemming from different physics submodels and datasets. This framework generates predictive “worst-case” guarantees for Bayesian networks while handling correlations and causations in heterogeneous data for both parametric and model uncertainties and is based on recent work in robust methods for quantifying probabilistic model uncertainty (*27*, *28*). Our proposed framework, combining AI and uncertainty quantification (UQ), systematically apportions and quantifies uncertainty to create interpretable and correctable models; this is performed through assimilation of data and/or improvement of physical models to enable trustworthy AI for chemical sciences. We reduce the complexity of the nondeterministic polynomial time (NP)–hard problem of learning a PGM by leveraging expert knowledge of the underlying chemistry. We demonstrate this framework in the prediction of the optimal reaction rate and oxygen binding energy for the oxygen reduction reaction (ORR) using the volcano model. While UQ has been applied to deterministic volcano-based models in general (*29*), and the ORR model specifically (*30*), prior methods have been limited both by the physics model’s underlying structure and, importantly, the lack in interpretability of uncertainty in predictions in terms of modeling decisions and available data in different model components. We demonstrate that about half of the model uncertainty stems from density functional theory (DFT) uncertainty, comparable error from lack of sufficient number and quality of experimental data and from correlations in parameters (~20% each), and the remaining (~10%) from the solvation model. This analysis provides a blueprint for prioritizing model components toward correctability and improved trustworthiness by underscoring the need foremost of more accurate electronic structure calculations and secondary by better experiments. We illustrate model correctability with an example.

## RESULTS

### Physics model for the ORR and the deterministic volcano

Hydrogen fuel cells can nearly double the efficiency of internal combustion engines and leave behind almost no emissions, especially if environmentally low footprint H_{2} is available (*31*). Furthermore, the hydrogen fuel cell is a mature technology that produces electricity via the hydrogen oxidation reaction at the anode and the ORR at the cathode (Fig. 1B); polymer electrolyte membrane fuel cells for this type of reaction are commercially available (*32*). Because of the high cost of platinum (Pt) catalyst and stability problems of other materials in an acidic electrolyte, recent focus has been on developing alkaline electrolytes. This technology, while extremely promising, results in slower reaction rates (by ~2 orders of magnitude compared to a Pt/acidic electrolyte) and thus bigger devices (*33*, *34*). Overcoming slower rates with stable materials requires discovery of new, multicomponent catalysts, e.g., core-shell alloys.

The ORR depends on the formation of surface hydroperoxyl (OOH*), from molecular oxygen (O_{2}), and of water (H_{2}O), from surface hydroxide (OH*) (*35*). The complete mechanism (*7*, *36*, *37*) involves four electron steps (Fig. 1A) and is described in detail in section S1. Among these, reactions R1 and R4 are slow (*7*). Acceleration of the ORR then translates into finding materials that speed up the slower of the two reactions, R1 and R4. An approach to find new materials entails generation of an activity model (Fig. 1C) as a function of descriptor(s) that can be estimated quickly using DFT calculations (*9*). This is known as the deterministic volcano (Sabatier’s principle) and has been the key model for discovery of new materials.

Next, we discuss the human workflow in constructing the volcano curve. First, we use a physics equilibrium model to compute the rate *r* from the minimum free energy of reactions R1 and R4 (*7*, *38*), such that*k*_{B} is the Boltzmann constant and *T* is the temperature. Instead of Eq. 1, one could use a more elaborate model, such as a mean-field microkinetic (detailed reaction mechanism) or a kinetic Monte Carlo model, which is a more complex multiscale model. Such models impose conservation laws (mass conservation and catalyst site balance) and are selected on the basis of expert knowledge. The Gibbs free energy Δ*G _{i}* of the ith species is calculated from the electronic energy (

*E*

_{DFT}) and includes the zero-point energy, temperature effects, and an explicit solvation energy (

*E*

_{solv}) in water, as detailed in Methods. This calculation entails, again, physics-based models (statistical mechanics here) and expert knowledge, e.g., in selecting a solvation model and statistical mechanics models. See section S1 for an explanation of the equilibrium model and resulting formula in Eq. 1.

The free energies Δ*G*_{1} and Δ*G*_{4} are computed as linear combinations of the free energies of species, while accounting for stoichiometry (a constraint), and are regressed versus Δ*G*_{O} (the descriptor); see data in Fig. 1C. Typically, only two to three data points for coinage metals (Ag, Au, and Cu) on the right leg of the volcano are regressed, especially if experimental data (instead of DFT data) are used (data corresponding to dotted lines are not observed in most experiments). The intersection of the two lines (Fig. 1C) determines the maximum of the volcano curve and provides optimal material properties, i.e., the

### Probabilistic AI for chemistry and the probabilistic volcano

Here, we develop a probabilistic AI-based workflow that augments the human workflow (in Fig. 1D) to create a probabilistic volcano. The mathematical tool we use to formulate the probabilistic volcano is the PGM. PGMs represent a learning process in terms of random variables, depicted as vertices of the graphs, which explicitly model their interdependence in terms of a graph. This interdependence stems from (i) one variable influencing others, called causality, depicted by arrows (directed edges); and (ii) correlations among variables, depicted by simple (undirected) edges between vertices (see below). PGMs are defined as the parameterized conditional probability distribution (CPD) and for Bayesian networks are defined such that

*Pa _{Xi}* = {

*X*

_{i1}…,

*X*} ⊂ {

_{im}*X*

_{1}…,

*X*} denotes the parents of the random variable

_{n}*X*, and θ = {θ

_{i}_{Xi ∣ PaXi}}

^{n}_{i = 1}are the parameters of each CPD,

*P*(

*X*∣

_{i}*Pa*, θ

_{Xi}*). Uppercase “*

_{XiPaXi}*P*” indicates a stochastic model or submodel. A key concept in PGMs is that the random variables are conditionally independent. This concept is central to constructing complex probability models with many parameters and variables, enabling distributed probability computations by “divide and conquer” using graph-theoretic model representations. By combining the conditional probabilities in Eq. 2, we find the joint probability distribution of all random variables

*X*. A detailed formalism for the construction of the PGM is given in section S4.

Structure learning of graphical models is, in general, an NP-hard problem (*39*, *40*) if one considers the combinatorial nature in connecting a large number of vertices. We overcome this challenge by constraining the directed acyclic graph (DAG) (*41*), representing the probabilistic ORR volcano (Fig. 2A), using domain knowledge that includes multiscale, multiphysics models discussed above, expert knowledge, and heterogeneous data (experimental and DFT) along with their statistical analysis.

First, statistical analysis of data finds hidden correlations or lack thereof between variables and is also central to building the PGM. In this example, statistical analysis of the computed formation free energy data of O*, OOH*, and OH* indicates correlations among data, i.e., connections between vertices (Fig. 2A). Specifically, Δ*G*_{OOH} and Δ*G*_{OH} are correlated with Δ*G*_{O}. The correlation coefficients of Δ*G*_{O} with −Δ*G*_{1} and −Δ*G*_{4} are −0.95 and 0.91, respectively; see section S4 for notes on statistical independent tests used. Reaction free energies Δ*G*_{1} and Δ*G*_{4} are linear combinations of Δ*G*_{OOH} and Δ*G*_{OH}, respectively; we use the reaction free energies as dependent vertices, as the reaction rate depends directly on Δ*G*_{1} and Δ*G*_{4}. Subsequently, we choose Δ*G*_{O} as the independent node (descriptor) because, of all the surface intermediates, it has the fewest degrees of freedom (and therefore local minima) on any given potential energy surface for faster quantum calculations. The selection of the descriptor, which is another example of expert knowledge in our C-PGM, establishes causal relationship (direction of influence) represented by directed edges from Δ*G*_{O} to Δ*G*_{1} and Δ*G*_{4}. Expert knowledge is also leveraged to assign relevant errors (ω) to vertices and directed edges. Figure 2A (colored circles) depicts the multiple uncertainties (random variables) ω modeled in each CPD of the PGM and how these (causally) influence the uncertainty of each vertex. All these causal relationships are modeled by a DAG in Fig. 2A and the Bayesian network in Eq. 3. Causality simplifies the construction and UQ analysis of the PGM. Last, the lack of an edge between Δ*G*_{1} and Δ*G*_{4} (Fig. 2A) is found from conditional independence tests on the DFT data. By eliminating graph edges of uncorrelated parts of the graph, the constrained DAG is profoundly simpler. A complete, step-by-step discussion of the structure learning of the ORR C-PGM is included in section S4.1.

The C-PGM structure contains information from expert knowledge, causalities, physics (physical models, conservation laws, and other constraints), correlations of data, parameters, and hierarchical priors (priors of a prior) in model learning (*13*, *25*). The physical meaning and estimation of these uncertainties are discussed below. Overall, the model for the ORR C-PGM becomes*G*_{O (DFT)} indicates a calculated value from DFT and all other Δ*G* values represent the “true value” given errors. Lowercase “*p*” indicates probability densities that are assumed here to be Gaussian, thus rendering the C-PGM (Eq. 3) into a Gaussian Bayesian network (*25*). Note that this PGM is used as part of an optimization scheme where Δ*G*_{O (DFT)} is formulated as a random variable given any value of the true Δ*G*_{O} and distribution of errors for Δ*G*_{O}. Leveraging the human-based (deterministic) workflow in Fig. 1D, the ORR is modeled as a stochastic optimization problem such that*r**. It is convenient to compute *k*_{B}*T* ln (*r**),

For the rest of this paper, *k*_{B}*T* ln (*r**) are considered as the QoIs (quantities of interest) that need to be optimized.

### Model uncertainty, guarantees, nonparametric sensitivity, and contributions to model error for interacting variables

Model uncertainty arises from multiple sources, such as use of sparse data in Fig. 1C, hidden correlations between vertices in the graph, simplified statistics models (linear regression between free energies in Fig. 1C and Gaussian approximations of errors; Fig. 2, B to E), and uncertainty in different model components and variables. These include errors in experimental data (ω* _{e}*), DFT data (ω

*), solvation energies (ω*

_{d}*), and regressions (correlations) used to determine the optimum*

_{s}*); correlation error is accentuated by the small data available especially on the right leg of the volcano. Experimental errors (ω*

_{c}*) in Δ*

_{e}*G*

_{O}and Δ

*G*

_{OH}can be found by repeated measurements in the same laboratory and between different laboratories. Repeated calorimetry and temperature-programmed desorption measurements for the dissociative adsorption enthalpy of O

_{2}in the same and different labs provide a distribution of errors for Δ

*G*

_{O}. The distribution of DFT errors (ω

*) is computed by comparing experimental and calculated (DFT) data across various metals. The mean value and SD of errors are provided in table S1 along with a detailed description of how errors were calculated in Methods and section S3. In Fig. 2A, the “parent vertex” is determined by the direction of the arrow such that ω*

_{d}_{e1}is a parent of Δ

*G*

_{1}(the child). These additional uncertainties from multiple sources are shown in Fig. 2 (B to E) and are combined to build the PGM model

*P*.

When building the C-PGM model *P*, “model uncertainties” arise from the sparsity and quality of the available data in different components of the model, the accuracy of the physics-based submodels, and the knowledge regarding the probability distribution of the errors (Fig. 2, B to E). Consequently, the mean value of min{ − Δ*G*_{1}, − Δ*G*_{4}} with respect to *P* is itself uncertain since the probabilistic model *P* is uncertain. For this reason, we consider *P* as a baseline C-PGM model, i.e., a reasonable but inexact “compromise.” Here, we take *P* to be a Gaussian Bayesian network (see Fig. 2, B to E), where the error probability distribution function for each component of the model is approximated as a normal distribution and is built using the nominal datasets and submodels discussed above. We isolate model uncertainty in each component (CPD) of the entire model (Eq. 3), in contrast to the more standard parametric (aleatoric) uncertainty already included in the stochasticity of *P* itself. We mathematically represent model uncertainty through alternative (to *P*) models *Q* that include the “true” unknown model *Q**. As examples, models *Q* can differ from *P* by (i) replacing one or more CPDs in Eq. 3 by more accurate, possibly non-Gaussian CPDs that represent better the data in Fig. 2 (B to E), (ii) more accurate multiscale physics models, and (iii) larger and more accurate datasets. Quantifying the impact of model uncertainties on predicting the QoI using *P*, instead of better alternative models *Q*, is discussed next. Overall, developing the mathematical tools to enable identification of the components of a PGM that need improvement is critical to correct the baseline model *P* with minimal resources.

Each model *Q* is associated with its own model misspecification parameter η that quantifies how far an alternative model *Q* is from the baseline model *P* via the Kullback–Leibler (KL) divergence of *Q* to *P, R*(*Q*‖*P*). We use the KL divergence due to its chain rule properties that allow us to isolate the impact of individual model uncertainties of CPDs in Eq. 2 on the QoIs in Fig. 2, see sections S6 and S7. To isolate and rank the impact of each individual CPD model misspecification (η* _{l}*), we consider the set of all PGMs

*Q*that are identical to the entire PGM

*P*except at the

*l*th component CPD (for dependence on the

*l*th parents) and less than η

*in KL divergence from the baseline CPD*

_{l}*P*(

*X*∣

_{l}*Pa*) while maintaining the same parents

_{Xl}*Pa*. We refer to this family, denoted by

_{Xl}*D*

^{η}

*, as the “ambiguity set” for the*

_{l}*l*th CPD of the PGM

*P*(see section S6 for its mathematical definition). Given the set of PGM’s

*D*

^{η}

*, we develop model uncertainty guarantees*

_{l}*Q*in

*D*

^{η}

*with respect to the baseline*

_{l}*P*

For a given η* _{l}*, the model uncertainty guarantees to describe the maximum (worst case) expected bias when only one part of the model in the PGM,

*P*(

*X*∣

_{l}*Pa*), is perturbed within η

_{Xl}*; therefore, they measure the impact of model uncertainty in any component (CPD) of the PGM on the QoI. Since η*

_{l}*are not necessarily small, the method is also nonperturbative, i.e., it is suitable for both small and large model perturbations.*

_{l}Equation 6 can be also viewed as a nonparametric model sensitivity analysis for PGMs since it involves an infinite dimensional family of model perturbations *D*^{η}* _{l}* of the baseline model

*P.*This family can consider the sparsity of data by addition of new or higher-quality data, e.g., higher-level DFT data, alternative densities to the Gaussians in Fig. 2 (B to E), e.g., richer parametric families or kernel-based CPDs, or more accurate submodels. All these are large, nonparametric perturbations to the baseline

*P*model. For these reasons, Eq. 7 allows one to interpret, reevaluate, and improve the baseline model by comparing the contributions of each CPD to the overall uncertainty of the QoIs through the (model uncertainty) ranking index

For more details, see theorem 1 in section S6 where we show that for Gaussian Bayesian networks, the ratios in Eq. 7 are computable.

We can use two strategies regarding η* _{l}*. First, η

*can by tuned “by hand” to explore how levels of uncertainty in each component of the model,*

_{l}*P*(

*X*∣

_{l}*Pa*), affect the QoIs. This approach is termed a stress test in analogy to finance where in the absence of sufficient data, models are subjected to various plausible or extreme scenarios. Second, instead of treating η

_{Xl}*as a constant, we can estimate η*

_{l}*as the “distance” between available data from the unknown real model and our baseline PGM*

_{l}*P*; we refer to such η

*’s as data based, in contrast to stress tests (see section S8). For example, the data can be represented by a histogram or a kernel density estimator (KDE) approximation (*

_{l}*42*). In this sense, the contribution to model uncertainty from any error source is both a function of its variance and how far away the error is from the baseline, e.g., the Gaussian CPDs in Fig. 2 (B to E).

Given the error distributions and their Gaussian representation in PGM model *P*, the expected value of min{ − Δ*G*_{1}, − Δ*G*_{4}} ∣ Δ*G*_{O} (black curve) in Fig. 3A is obtained. The color bar in Fig. 3A indicates how likely a reaction rate occurs with given Δ*G*_{O} for model *P* (aleatoric uncertainty). The gray dashed lines in Fig. 3A correspond to the two extreme scenarios (derived in section S5) for all possible models *Q* by considering uncertainty in all components. We highlight in Fig. 3B the expected value (black line) and the extremes (gray dashed lines) when only the DFT error in Δ*G*_{4} is considered. All η values here are data based and determine what models *Q* are considered in construction of the bounds; only PGMs that have a KL divergence that is less than or equal to η from the baseline are considered. The red, orange, and green lines indicate potential QoIs that can be computed; here, we focus on the uncertainty in the rate (*y* axis; difference between red lines) and the variability of optimal oxygen binding energy (*x* axis; difference between orange lines) as a proxy of materials selection; see section S7 for more details.

Using the model uncertainty guarantees (Eq. 6), we quantify the uncertainty and its impact on model predictions beyond the established parametric uncertainty; again, all η* _{l}*’s are data based. By sourcing the impact of each submodel and/or data, Eq. 7 reveals what data, measurement, and computation should be improved. The error in the optimal reaction rate (Fig. 4A) stems from a nearly equal contribution of submodels, specifically by solvation (30%), experiment (18%), DFT (33%), and parameter correlation (18%). The uncertainty in the optimal oxygen free energy variability (Fig. 4B), i.e., the materials prediction, stems from solvation (6%), experiment (8%), DFT (48%), and parameter correlation (37%). Different QoIs are influenced to a different degree by different submodels. In both QoIs, the DFT error stands as the most influential. The correlation between O*, OOH*, and OH* is the next most important component regarding materials prediction, whereas solvation is the second ranked component regarding reaction rate. Such predictions are nonintuitive. While previous work found that parametric-based microscale uncertainties can be dampened in multiscale models (

*43*), the results of this work will generalize to any models where fine-scale simulations (such as DFT) are sparse or the macroscale QoIs can be made proportional to the microscale properties. In the next section, we show that Eq. 6 and the resulting Fig. 4 can also be deployed to improve the baseline (purely Gaussian) model

*P*.

### Model correctability enabled by model UQ

Model uncertainty due to any submodel or dataset, quantified by η* _{l}* and Eq. 6, can be reduced by picking a better submodel or dataset than the original baseline model

*P*. Obviously, those CPDs that exhibit larger relative predictive uncertainty in Eq. 6 should be prioritized and corrected. In our case study, reducing the DFT error requires to further develop DFT functional and methods, a long-standing pursue not addressed here. Here, we illustrate how to carry out such model correctability through an example that is feasible to do. Specifically, we consider the model consisting of the data used to construct the volcano and its statistical representation as this is the second most influential parameter in materials prediction. We performed additional DFT calculations on core-shell bimetallics to create an expanded dataset compared to that in Fig. 1C (see Fig. 5A). By doing this, we compute the model sensitivity indices

_{l}Figure 5B shows the reduction of model uncertainty guarantees, defined as Eq. 6, which are due to the variance of error and the estimated model misspecification parameter in the correlation between DFT-calculated values of Δ*G*_{4} and Δ*G*_{O}, when more data (bimetallics) are added. With bimetallic data included, the correlation coefficients of Δ*G*_{O} with −Δ*G*_{1} and −Δ*G*_{4} are −0.95 and 0.95, respectively. The uncertainty is reduced both due to improved correlation and reduced SE in the regression coefficients as a result of more data.

## DISCUSSION

Here, we introduce PGM for chemistry to embed uncertainty into the design of a model. The approach provides a blueprint to systematically integrate desperate components of a model ranging from heterogeneous data (experimental and DFT) to expert knowledge to physics/chemistry models and constraints to correlations among data and causality between variables. Instead of a deterministic model, a probabilistic ensemble of models is created. Furthermore, the model uncertainty and sensitivity indices derived herein provide guarantees on model prediction to systematically identify the most influential model components causing predictive uncertainty and ultimately ensure trustworthiness of predictions. Overall, our proposed mathematical framework combines probabilistic AI and UQ to provide explainable results, leading to correctable and, eventually, trustworthy models. We illustrate this framework for a volcano-kinetic model for the ORR. We propagate both parametric and model uncertainty from several, small sets of input data to model predictions. We establish model error bounds on the ORR volcano to assess maximum and minimum rates given the binding energy of atomic oxygen as the primary descriptor. We assess the impact of these errors via model sensitivity indices, which quantify the percent error from uncertainty contributed by each variable to the predicted maximum ORR rate and the oxygen binding energy corresponding to that rate. The greatest contribution to errors (ordered from greatest to least) in the PGM-based ORR volcano in predicting materials arises from error in DFT calculations and correlations of OOH* and OH* binding energies with O* binding energies. Different from the materials, the reaction rates depend mainly on errors associated with DFT and solvation, yet experimental error and correlations are relatively large as well, i.e., a more equidistribution of error is observed and improved accuracy of all components is needed to size electrochemical devices. Improving the accuracy of DFT method and the quality and quantity of data can pave the way for more accurate models for finding new catalysts.

## METHODS

### DFT calculations

We study adsorption on the close-packed (111 and 0001) transition metal surfaces. We select the lowest energy site of O* and OH* for comparison with experiments to determine errors, which are summarized in table S1. We build the correlations for bimetallics from the lowest energy sites on the (111) and (0001) surfaces of the face-centered (fcc) and hexagonal close-packed (hcp) metals, respectively.

*Vacuum phase DFT setup*. We calculated binding energies and vibrational frequencies using the Vienna ab initio Simulation Package version 5.4 with the projector-augmented wave method (*44*). We use the Revised Perdew-Burke-Ernzerhof (RPBE) density functional (*45*) with D3 dispersion corrections (*46*). Simulation methods include use of spin-polarized calculations for gas-phase species and ferromagnetic metals, a 3 × 3 × 1 Monkhorst-Pack *k*-point sampling grid (*47*) for all slab calculations, and a 400-eV plane wave cutoff. Electronic energy convergence was set to 10^{−4} eV for the energy minimization step and 10^{−6} eV for frequency calculations.

For calculations of gas-phase species, the supercell size was 10 × 10 × 10 Å. A Brillouin zone was sampled at the gamma point; a 0.005 eV/Å force cutoff was used in geometry optimizations. For slab calculations, the force cutoff was set to 0.02 eV/Å with 20 Å of vacuum space. Adsorbate energies were calculated for OOH*, OH*, and O* on the most stable close-packed surface for fcc and hcp metals. The periodic cell consisted of four layers with 16 metal atoms in each layer; the bottom two layers were fixed at their bulk values, determined using a 15 × 15 × 15 *k*-point grid with the tetrahedron method and Blöchl corrections. Bulk metal lattice constants were pre-optimized with DFT using the Birch-Murnaghan equation of state (*48*). Zero-point energies are calculated for each adsorbate-surface combination and for all gas species. All input files were created using the Atomic Simulation Environment (*49*).

*Solvation phase DFT setup*. We emulate explicit solvation calculations from previous work (*38*) except that, here, we vary the number of water layers. Two to five layers of water were placed above a Pt(111) surface in a honeycomb pattern to simulate the aqueous phase above the surface. The double layer of water molecules was found to adequately capture water binding energies on Pt(111) and H bonds at the surface (*50*). We determined solvation energies for O* by placing it in an fcc hollow site on the water-covered surface. For OH* and OOH*, solvation energies were determined by replacing a water molecule on the surface with the respective species to determine solvation energies. Other than the choice of functional, the DFT setup was identical to that in the vacuum except that nine Pt atoms were included in each layer to accommodate the honeycomb water structure, the *k*-point sampling was increased to 4 × 4 × 1, and the plane wave cutoff was increased to 450 eV. To provide initial geometries, the Perdew-Burke-Ernzerhof (PBE) functional (*51*) was used for all solvation calculations. Solvation energy calculations on Pt(111) using the PBE functional do not cause inconsistencies with the use of the RPBE functional for vacuum phase calculations. Granda-Marulanda *et al*. (*52*) showed that on several 111 and 0001 surfaces, the average difference in OH* solvation using the PBE and RPBE functionals with dispersion corrections was 0.03 eV; the SDs using these functionals were similar at 0.08 and 0.11 eV, respectively. Rather than changes in solvation across different surfaces, we investigate the variance in solvation energy associated with the number of explicit water layers used. Because energies from PBE and RPBE are correlated, the variance in solvation energy with respect to number of water layers is expected to be similar.

*Temperature effects*. Temperature effects at 298 K were calculated using statistical thermodynamics in combination with the harmonic and ideal gas approximations (*53*). Both heat capacity and entropy effects were included in calculating Gibbs free energies used in the volcano curves. Entropy was removed when comparing to experimental enthalpies as discussed in section S3.

### Deriving model sensitivity indices

Using robust and scalable UQ methods for general probabilistic models (*27*, *28*, *54*) as a starting point, we define “ambiguity sets” around a baseline model *P* and “predictive uncertainty for QoIs.” Although the definitions of predictive uncertainty (section S6) and model sensitivity indices (Eq. 6) are natural and rather intuitive, it is not obvious that they are practically computable. A key mathematical finding for PGMs, demonstrated in theorem S1, is that that the guarantees *P* is a Gaussian Bayesian network (*G*), the ranking indices in Eq. 7 are computed using Eq. 9.

### Selecting new high-quality data or improved physical model for model correctability

Given a baseline model *P* and the sparse dataset for each submodel sampled from an unknown model *Q*, we can build an improved baseline model *P* for our ORR model following the steps below.

Step 1: Find suitable data-based η* _{l}*’s:

*Q*is the surrogate model given by the KDE/histogram, using eqs. S94 and S95.

Step 2: Calculate the model uncertainty guarantees for a given QoI using eq. S58 (or eq. S60 for the general PGM)

Step 3: Select the *l** component *X*_{l}* of the PGM with the worst guarantees

Step 4: Reduce *X*) = min { − Δ*G*_{1}, − Δ*G*_{4}} ∣ Δ*G*_{O}, we have that*l** component(s) of the PGM, we seek the most useful additional data, namely the data that tightens (reduces) the guarantees in Eq. 9. Note that the guarantees consist of two parts: the moment generating function (MGF) and the model misspecification parameter η. Therefore, adding informative data can reduce the MGF in Eq. 9 (and, thus, the uncertainty guarantees *27*), additional data can improve model *P* and reduce the model misspecification η (see section S8).

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/42/eabc3204/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgements:**J.L.L. and D.G.V. acknowledge valuable discussions with J. Lym, K. Alexopoulos, and G. Wittreich.

**Funding:**The research of M.A.K. was partially supported by NSF TRIPODS CISE-1934846 and the Air Force Office of Scientific Research (AFOSR) under the grant FA-9550-18-1-0214. The research of J.F. was partially supported by the Defense Advanced Research Projects Agency (DARPA) EQUiPS program under the grant W911NF1520122, and part of J.F.’s work was completed during his PhD at UMass Amherst. J.L.L. and D.G.V. acknowledge support by the RAPID Manufacturing Institute, supported by the Department of Energy (DOE) Advanced Manufacturing Office (AMO), award number DE-EE0007888-9.5. RAPID projects at the University of Delaware are also made possible, in part, by funding provided by the State of Delaware. The Delaware Energy Institute acknowledges the support and partnership of the State of Delaware in furthering the essential scientific research being conducted through the RAPID projects. J.L.L.’s research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under contract no. DE-AC02-05CH11231. The 2019 to 2020 Blue Waters Graduate Fellowship to J.L.L. is also acknowledged.

**Author contributions:**J.F. developed and implemented the PGM model. J.F. and M.A.K. developed UQ for PGMs. M.A.K. conceived the use of PGMs for model uncertainty, as well as the related information-theoretic tools. J.L.L. developed the physical model, and D.G.V. thought of the idea to make UQ explainable, to apply the PGM model to the ORR, and the need to apportion error to different model inputs for sparse data and missing physics. All authors contributed to writing.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All underlying DFT and experimental data are available on Zenodo. Software will be made available upon request. Additional data related to this paper may be requested from the authors.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).