Context for interpreting equilibrium climate sensitivity and transient climate response from the CMIP6 Earth system models

See allHide authors and affiliations

Science Advances  24 Jun 2020:
Vol. 6, no. 26, eaba1981
DOI: 10.1126/sciadv.aba1981


For the current generation of earth system models participating in the Coupled Model Intercomparison Project Phase 6 (CMIP6), the range of equilibrium climate sensitivity (ECS, a hypothetical value of global warming at equilibrium for a doubling of CO2) is 1.8°C to 5.6°C, the largest of any generation of models dating to the 1990s. Meanwhile, the range of transient climate response (TCR, the surface temperature warming around the time of CO2 doubling in a 1% per year CO2 increase simulation) for the CMIP6 models of 1.7°C (1.3°C to 3.0°C) is only slightly larger than for the CMIP3 and CMIP5 models. Here we review and synthesize the latest developments in ECS and TCR values in CMIP, compile possible reasons for the current values as supplied by the modeling groups, and highlight future directions. Cloud feedbacks and cloud-aerosol interactions are the most likely contributors to the high values and increased range of ECS in CMIP6.


One of the earliest concepts of the climate system response to increasing carbon dioxide (CO2) concentration comes from a simple model of the relationship between forcing and response (1, 2)N=F+λΔT(1)where for the net top of atmosphere energy balance, N, and a given radiative forcing, F, there is a global surface temperature response, ΔT, multiplied by a feedback factor, λ. For a given forcing associated with a doubling of atmospheric CO2 concentration (with a radiative forcing of about 3.7 W m−2), at equilibrium N = 0, we can then solve for ΔT, a quantity known as the “equilibrium climate sensitivity” (ECS). This simple concept was first applied to gauge climate sensitivity in the earliest climate model experiments that were performed in the 1970s with models that included highly simplified oceans. In those experiments, CO2 was instantaneously doubled, and the model was run to a statistical equilibrium state. The warming that occurred was defined as the ECS or just simply the climate sensitivity of the system (1). This simplistic view of the response of the Earth system to a change in external forcing from increasing CO2 was viewed at that time as informing the magnitude of the relative climate change Earth could experience in the future and that range for the doubling of CO2 was first assessed to be 1.5° to 4.5°C, based on physical understanding and results from two early idealized models (1).

Since then, ECS has been estimated from each generation of Earth system models as a standard metric of their response to increased CO2. The transient climate response (TCR), defined as the global temperature change at the time of CO2 doubling in a 1% per year compounded CO2 increase experiment, has also become a standard metric of model sensitivity as the response to increasing CO2. Because modeling groups routinely calculate ECS and TCR for each new model version, the simulations required to calculate these metrics are now included in the standard Diagnostic, Evaluation and Characterization of Klima (DECK) experiments, which are requirements for participation in the Coupled Model Intercomparison Project Phase 6 (CMIP6) (3). The relationship between ECS and TCR, and how these metrics have varied over the generations of Earth system models, has been a subject of intense interest and is addressed here. The present generation of CMIP6 models has a greater range of ECS, with higher values at the upper end of the range than previous generations of models. This has elicited scrutiny, because there are implications for the magnitude of warming in future climate projections and associated policy-relevant mitigation strategies (4). There are also questions as to how the TCR range relates to corresponding values of ECS.

Here, we review the historical context for these metrics from previous generations of CMIP models in relation to the current generation in CMIP6 (3), discuss the relationship between TCR and ECS with regard to time scales of response, address factors at work that could be producing higher values of ECS in some of the CMIP6 models, as identified by the modeling groups, and point to unresolved questions surrounding ECS and TCR.


In the era of the Intergovernmental Panel on Climate Change (IPCC) assessments, starting in the 1990s, each assessment largely maintained the same assessed range for ECS (1.5° to 4.5°C), and the climate models used to estimate ECS were generally close to that range (Fig. 1). Early on, global atmospheric models coupled to simple nondynamic “slab” ocean (or mixed layer ocean) models fit into this paradigm and were assessed in the first IPCC report (5). CO2 was instantaneously doubled in such a model configuration and then run to equilibrium, which was usually attained in about 20 years or so (6, 7). Sea ice in these models was usually a simple thermodynamic formulation.

Fig. 1 Historical values of ECS and TCR.

Assessed values of ECS (blue bars) and TCR (red bars), ranges from models of ECS (orange bars), and TCR (green bars; single value from the AR1 is green dot); numbers are individual model values of ECS from CMIP5 and CMIP6 (available on the ESGF as of March 2020). The numbers denoting individual models for CMIP5 are listed in Table 1 and those for CMIP6 in Table 2. Sources for values: AR1: table 3.2a of [IPCC First Assessment Report Ch. 3 (5)]; (ECS, 19 models with variable clouds; TCR, 1 model). AR2/CMIP1: figure 6.4 and table 6.3 of [IPCC Second Assessment Report Ch. 6 (18)] (ECS, 9 models; TCR, 13 models). AR3/CMIP2: table 9.1 of [IPCC Third Assessment Report, Ch. 9 (20)] (ECS, 14 models; TCR, 19 models). AR4/CMIP3: figure 10.25 of [IPCC Fourth Assessment Report Ch. 10 (21)] (ECS and TCR, 19 models). AR5/CMIP5: figure 9.42 and table 9.5 of [IPCC Fifth Assessment Report Ch. 9 (25)] (ECS, 23 models; TCR, 30 models; this differs somewhat from currently available CMIP5 models in the ESGF in Table 1). CMIP6: ECS (37 models) and TCR (37 models), with data available from a total of 39 models on the ESGF in March 2020 (Table 2).

The estimates of the ECS using these models were likely comparable to later models using a full ocean model component if the latter were run to equilibrium (8, 9). However, the transient response affects the estimate of ECS because the time response of the coupled model can change with warming, along with time-varying feedbacks and patterns of surface warming (1012). This is because the response of the dynamic ocean can critically affect the transient response to CO2 increases on time scales longer than a few years.

In the late 1980s, as soon as computer power allowed, global atmospheric models that previously had idealized continental outlines started to be run with realistic distributions of land and ocean and were synchronously coupled to coarse-grid (about 5° latitude-longitude) dynamical ocean models, still with simple sea-ice formulations (13, 14). With this class of model [here referred to as atmosphere-ocean general circulation model (AOGCM)], it was possible to do a time-dependent or “transient” experiment where CO2 could be gradually increased. This more realistically represented what could plausibly be expected to happen in the real climate system on the century time scale. Therefore, a common transient experiment was devised whereby CO2 was increased 1% per year compounded, and the surface temperature increase at the time of CO2 doubling (about year 70) was computed as the “climate change” due to a doubling of CO2 (14, 15). The transient warming in such an experiment was less than at equilibrium due to the large oceanic heat capacity. An early version of this type of model appeared in the first IPCC assessment (16), with a warming at the time of CO2 doubling of 2.3°C (a single value given for IPCC 1990 in Fig. 1). Results from three more groups’ transient experiments were included in the 1992 IPCC update (17), although these models used somewhat different experimental designs for their simulations and were only qualitatively comparable.

In the early 1990s, a growing number of international modeling groups were running such 1% CO2 increase experiments to quantify the nonequilibrium response to a gradual increase of CO2. This generation of models was assessed in the IPCC AR2 (18). Recognizing that there should be agreement on the experimental design used by all the modeling groups to facilitate intercomparison, an early phase of CMIP in the 1990s [CMIP2 (19)] specified that a 1% CO2 increase experiment should be run to provide a standard measure of the transient response of the climate system. It was hoped that this intercomparison would lead to a greater understanding of the reasons for differences in the models’ time-dependent responses. With the emergence of a range of plausible future emission scenarios furnished by the integrated assessment modeling community, climate models were additionally run in time-evolving climate change projections that were assessed in the IPCC AR3 with the Special Report on Emission Scenarios (SRES) scenarios (20) and in subsequent IPCC assessments (21). About this time, modeling groups started to run several standard experiments during the course of model development that included an instantaneous CO2 doubling experiment with a mixed layer formulation to get an estimate of the ECS, and a 1% transient CO2 increase experiment with the fully coupled model to obtain what was termed the TCR in the IPCC AR3 (20). These experiments provided baseline metrics to compare responses of coupled AOGCMs from one generation to the next.

In the IPCC AR3, it was argued that TCR, rather than ECS, was a more relevant metric of model response to increasing CO2. It was straightforward to calculate in a modeling sense because the TCR is defined using the AOGCMs’ temperature response itself, and the overall magnitude of TCR was thought at that time to be more comparable to the time scale and magnitude of the response in the real world over the 21st century. In addition, there were factors that complicated the calculation and interpretation of ECS that were emerging by the late 1990s. Modeling groups were struggling to maintain nondynamic slab (or mixed layer ocean) model formulations coupled to the atmospheric model that were comparable to their fully coupled versions with a dynamical ocean coupled to the atmosphere (i.e., their AOGCMs). Sea-ice formulations in the AOGCMs were becoming more complex (including, for example, sea-ice dynamics). Sea surface temperature errors in the mixed layer models (due, in part, to their lack of ocean dynamics) were being corrected by different groups in different ways using a technique generally called “Q-flux” (22). A further complication related to the relevance of ECS was that paleoclimate studies were finding that other factors could influence the equilibrium response of the real world, such as vegetation, biogeochemistry, and dust, and most of these were not accounted for in the traditional definition of ECS (23).

A shortcut was proposed to estimate ECS (24). This resulted in a metric sometimes termed as the “effective climate sensitivity,” but for our purposes here, we will refer to it as the “ECS calculated by the Gregory method.” This will distinguish it from the ECS values obtained with earlier slab oceans coupled to atmospheric models and run to equilibrium. The ECS calculated by the Gregory method is derived from a fully coupled Earth system model and does not require equilibrium to actually be achieved. In the Gregory method, CO2 is instantaneously quadrupled in a fully coupled Earth system model and run for 150 years. As the surface temperature asymptotes toward equilibrium, the slope of the time-evolving curve of the net top-of-atmosphere radiance against the surface temperature is calculated to extrapolate the eventual temperature increase at equilibrium some time far in the future for a doubling of CO2, assuming that there is a roughly linear response that is half of the warming from a quadrupling of CO2. In contrast to the ECS values in previous IPCC assessments using atmosphere models coupled to nondynamic slab oceans, the Gregory method was applied to CMIP5 coupled models assessed in the IPCC AR5 (25). The Gregory method is still the most frequently used approach to calculating ECS from AOGCM simulations, although complications arising from this method have led to various other alternatives to be proposed.


A compilation of ranges of TCR and ECS for six generations of climate models dating back to 1990 and through the current CMIP6 generation is shown in Fig. 1, along with the expert judgment–assessed ranges of ECS and TCR from the various IPCC assessments. The TCR and ECS values shown in Tables 1 and 2 have been consistently calculated with the Earth System Model Evaluation Tool (ESMValTool) version 2.0 (26, 27) for the individual CMIP5 and CMIP6 models that were available from the ESGF at the time of publication. All figures of the paper have been produced with ESMValTool v2.0. ECS uses the Gregory method from a 150-year run of an instantaneously quadrupled CO2 simulation. TCR is calculated as the change in the 20-year average global mean surface temperature, centered around the time of CO2 doubling (years 60 to 79) relative to the 140-year period in the pre-industrial (PI) control that includes the time period of the 1% per year experiment, with the global temperature in the PI control smoothed by applying a linear 140-year fit to account for residual drift. ECS and TCR values calculated by the modeling groups and reported in their papers may differ slightly from those in Tables 1 and 2 that are calculated by the ESMValTool. For example, the Community Earth System Model Version 2 (CESM2) has a value for ECS in Table 2 of 5.2°C, while there are published values of 5.3°C, with TCR values of 2.1° and 2.0°C, respectively (9).

Table 1 ECS and TCR values (°C) calculated from CMIP5 model data available on the ESGF in March 2020.

Model numbers denote individual models (in second column) in Figs. 1, 2, and 4. Model acronyms are defined at

View this table:
Table 2 ECS and TCR values (°C) calculated from CMIP6 model data available on the ESGF in March 2020.

Model numbers denote individual models (in second column) in Figs. 1, 2, and 4. Model acronyms are defined at, and modeling groups at

View this table:

As noted above, the original assessed range of ECS was 1.5° to 4.5°C and was based on a few early simplified models (1). The range of ECS diagnosed from model simulations has remained near the assessed range until the present CMIP6 generation where the range has expanded to 1.8° to 5.6°C from 37 models, the largest range of ECS in any generation of model to date (Fig. 1; although not all CMIP6 models are yet included). Table 2 shows ECS and TCR, along with the multimodel means and SDs for 39 CMIP6 models compiled for this study (inadequate data on the ESGF preclude calculating ECS and TCR from four of the models, so there are 37 models for ECS and 37 models for TCR in Table 2). The SD of ECS in CMIP6 is 1.1°C, which can be compared to the lower value of 0.7°C in CMIP5 (Table 1), reflecting the previous smaller range. It could be possible that the SD is influenced by a single outlier, but the distribution of individual models in Fig. 1 and Table 2 shows two CMIP6 models with ECS values less than 2°C (1.8°and 1.9°C), while there are six models with ECS values above 5°C.

By contrast, the range in TCR of 1.3° to 3.8°C in CMIP1 (a range of 2.5°C) has shrunk to 1.3° to 3.0°C in CMIP6 (a range of 1.7°C), with an SD of 0.4°C (Fig. 1 and Table 2). This can be compared to the CMIP5 range of 1.5°C reported in the AR5 (in models available now that are compiled in Table 1, there is a range of 1.4°C, with a comparable SD value of 0.4°C). The CMIP6 range of 1.7°C indicates a modest upward shift from the range of 1.5°C in both CMIP3 and CMIP5. The multimodel mean values of ECS and TCR in CMIP5 are 3.2° and 1.8°C, respectively (Table 1), while comparable values of those in CMIP6 are 3.7° and 2.0°C (Table 2).

When comparing the ranges of ECS and TCR in Fig. 1, the changes in the range are larger than changes in mean values of ECS, with the latter being more consistent from generation to generation of model. For example, the highest ECS range is 3.8°C in CMIP6 compared to the lowest of 2.3°C in the CMIP3 models in the AR4. However, average values of ECS have remained near 3.5° ± 0.2°C for all generations of models. For TCR, starting with the CMIP2 models in the AR3, both the range and average values have remained relatively stable, with the average varying around 2.0° ±0.2°C and the range staying near 1.7° ± 0.2°C. It is also interesting to note that the high end of the assessed range of ECS, 4.5°C, has been exceeded by the high end of the multimodel range in every IPCC assessment except for the AR4. Similarly, the low end of the multimodel range has been higher than the low end of the assessed range of ECS in every IPCC assessment, which is similar to the two times that the TCR range has been assessed in the AR4 and AR5. Thus, over the generations of models, the multimodel ranges of ECS and TCR have been higher than the assessed ranges in the IPCC assessments, with the exception of the CMIP3 models in the AR4. It should be kept in mind that a range of model results cannot be interpreted as an estimate of uncertainty; the range might be too wide if unrealistic models are included, or it could be too small if many models are missing the same process or feedback.


There have been efforts to directly relate TCR to ECS. For example, in the IPCC AR4 from the CMIP3 multimodel ensemble, TCR was plotted as a function of ECS [figure 10.25 of (21)]. That TCR should increase with ECS is not unexpected because on all time scales, a more sensitive model can be expected to warm more. More refined analysis of simplified climate models suggests that the increase of TCR with ECS should be less than linear. A number of studies explained that there is a nonlinear relationship governed by a ratio involving two parameters, ECS and heat uptake efficiency. This implies that if all models have similar efficiency in sequestering heat, then the more sensitive models will, at any point in time, realize a smaller fraction of their eventual warming (2832). The IPCC AR4 used the CMIP3 multimodel ensemble to show a nonlinear behavior consistent with this prediction [figure 10.25 of (21)]. For more realistic transient climate change experiments, a similar result holds (33). Although in the IPCC AR5 [figure 9.42b of (25)] a linear fit to the TCR-ECS relationship intersects the ordinate well above 0, this violates basic physical considerations, which imply that the TCR should be 0 if the ECS is 0. If this constraint is applied, then the AR5 collection of model results are again consistent with a simple nonlinear relationship between TCR and ECS. A nonlinear relationship between TCR and ECS might also be explained by other factors, of course, and numerous studies have shown that warmer base states and higher CO2 concentrations can produce proportionately larger feedbacks and thus contribute to this nonlinear relationship (32, 3436).

Past studies also have indicated that global climate feedbacks likely vary as climate changes (24, 37, 38). Subsequently, this has been shown to be true in many models (11, 3941) and is likely related to variations in the patterns of surface temperature and radiative feedbacks (12). Thus, the increase in feedback (36, 39), due to either a transient pattern effect (38) or some kind of nonlinearity in the equilibrium response, such as feedback temperature dependence (33), can be demonstrated through a method that attempts to account for both types of phenomena in a ramp experiment (42).

Consequently, the ECS obtained by running fully cupled Earth system models to equilibrium over multimillennia is also likely larger than estimates of ECS computed using the Gregory method (43). This result has also been documented in studies that use paleoclimate data (35).

In addition, there are different time scales of response in the system, with some feedbacks and responses acting on the subcentury time scale and others operating on the multicentury time scale, with the nonlinear relationship between ECS and TCR indicative of those different time scales of response (41, 44, 45). The linear fit of ECS versus TCR in the CMIP6 models has a similar R2 value of 0.61 compared to 0.60 in CMIP5 (Fig. 2; for sufficient available data on ESGF, see note in figure caption).

Fig. 2 ECS as a function of TCR.

(A) From the CMIP5 models in the IPCC AR5 (black line is linear fit); (B) same as (A) except for CMIP6 models (black line is a linear fit). Note that 27 models are plotted for CMIP5 (Table 1) compared to a total of 23 and 30 models that supplied ECS and TCR values, respectively, to the IPCC AR5 used for the ranges in Fig. 1. The greater number of models plotted here denotes those with sufficient available data on the ESGF to perform corresponding ECS and TCR calculations, as defined in the ESMValTool discussed in the text. The R2 values are given in the upper left parts of each panel. The numbers denoting individual models for CMIP5 in (A) are listed in Table 1 and those for CMIP6 in (B) in Table 2.

This time scale dependence of feedbacks and response can be demonstrated in Fig. 3 where for the 37 currently available CMIP6 models with sufficient data on the ESGF, the multimodel mean ECS from the Gregory method, calculated over the full 150-year period of the 4xCO2 experiment, is 3.7°C. For the first 20 years of the simulations, however, the sensitivity is 3.3°C, and for the last 130 years, the value is 4.0°C. This indicates a time-varying feedback strength, leading to different time scales of response in the climate system.

Fig. 3 ECS calculated for the CMIP6 models in Table 2 using the Gregory method over different time scales.

Using the entire 150-year 4xCO2 experiment (black line), there is an ECS value of 3.7°C; using only the first 20 years (blue dots and blue line), there is an ECS of 3.3°C; and using the last 130 years, there is an ECS of 4.0°C (orange dots and orange line).


The question remains as to why the range of ECS has increased in CMIP6, with the upper end of the range extending well beyond the canonical 4.5°C value and higher than previous generations of models (Fig. 1). Various researchers, as well as the modeling groups, have been attempting to answer that question. One possibility is that the newer prognostic aerosol schemes that include aerosol-cloud interactions could have produced overly large negative radiative forcing, which then implied a need for a stronger model response to CO2 if the model was to reproduce the historical temperature response. Such relationships between amplitude of aerosol forcing and ECS have been noted in previous generations of models (46, 47). However, because the aerosol forcing varies with time, it is difficult to tune the ECS based on responses to aerosols over different periods of the 20th century (48). On the basis of the information in Table 3 provided by a subset of CMIP6 modeling groups, Fig. 4 shows that there is a weak relationship between amplitude of aerosol forcing and ECS, with larger negative present-day aerosol forcing associated with larger ECS (linear fit with an R2 of only 0.36 in Fig. 4). A simple explanation that these results might result from a relationship between forcing and aerosol-dependent cloud-radiative feedback responses that depend on whether aerosols are predicted or prescribed does not seem to hold. Some models have prognostic aerosol schemes with large negative forcing but low ECS (e.g., MRI-ESM2), while others have low values of negative aerosol forcing but high ECS (e.g., GFDL-CM4). For the latter, the composition (emissions) feedbacks present in GFDL-ESM4, but not in GFDL-CM4, can operate largely independently of the direct and indirect forcing; instead, they are feedbacks. In models with advanced treatments of biogeochemistry and vegetation, warming can induce changes in emissions, which alter the aerosol fields. In the GFDL-ESM4’s case, these provide negative feedbacks and reduce ECS. One group with high ECS and large negative aerosol radiative forcing (CESM2) ascribes the high ECS value to cloud feedbacks and aerosol-cloud interactions related to the details of stratiform cloud microphysics and associated ice nucleation, turbulence, rain formation and evaporation processes, and SO2 lifetime (49). Specifically, changes made to increase high-latitude supercooled liquid water, and to adjust warm rain susceptibility to aerosols in shallow clouds, have increased cloud feedbacks in that model. This would apply to anthropogenic and natural aerosols and contribute to higher values of both TCR and ECS. Another group in that category (HadGEM3-GC3.1-LL) has diagnosed the differences from an earlier model version with a lower ECS as arising from changes in the shortwave cloud-radiative feedback in the midlatitudes (mainly over the Southern Ocean), with the introduction of a new aerosol scheme and the development of a new mixed-phase cloud scheme (50). Meanwhile, two models with prognostic direct aerosol effect only but with no aerosol-cloud feedbacks (the two INM models) and two others with prescribed aerosols (the MPI and IPSL models), have similar relatively low values of aerosol forcing (around −0.6 W m−2) but with ECS values that range from 1.8° to 4.6°C (Fig. 4).

Table 3 Subsample of CMIP6 models shown in Fig. 1, with information supplied by the modeling groups regarding details of aerosol forcing and formulation and possible reasons for ECS values.

For the GFDL models, the higher sensitivities in parentheses denoted by asterisks result from longer runs and attempts to filter out unforced variability. Model acronyms are defined at, and modeling groups at

View this table:
Fig. 4 Effective radiative forcing from aerosols versus ECS.

Values supplied by the modeling groups (Table 3); black line is linear fit with R2 of 0.36. The numbers denoting individual models are listed in Table 2.

One common theme that emerges in Table 3 from five of the six models with ECS values greater than 5°C [E3SM, CESM2, UK Earth System Model (UKESM1), HadGEM3, and Canadian Earth System Model version 5 (CanESM5)], all with prognostic aerosol direct and indirect effects, is the notable role played by cloud feedbacks and/or cloud-aerosol interactions in high ECS. Four of the five have relatively large negative values of present-day aerosol effective radiative forcing (ERF), ranging from −1.1 to −1.7 W m−2. Two other models with prognostic aerosol direct and indirect effects and relatively large negative aerosol ERF values of −1.2 W m−2 (NorESM2-LM and MRI-ESM2) have lower values of ECS (2.5° and 3.1°C, respectively), but neither cite cloud feedbacks as being a notable part of their lower ECS values (Table 3). However, a model from one group (GFDL-ESM4) has ECS values that decreased from CMIP5 to CMIP6, and of the six factors cited as playing a role in this decrease, three relate to negative cloud-aerosol feedbacks (Table 3).

Additional evidence for the role of cloud feedbacks in models with high ECS has been ascribed to decreasing extratropical low cloud coverage and albedo with increasing temperature. This produces stronger positive shortwave cloud-radiative feedbacks (i.e., warmer temperatures produce less low clouds, allowing more incoming solar and more warming, and so on) that are directly related to how clouds are represented in those models (51). There are indications from some paleoclimate and observational studies that the higher estimates of ECS are less credible than the lower estimates (36, 5254), while other paleoclimate studies suggest a possibility of higher ECS values (35, 55). By applying observations in an emergent constraints context, there is evidence that the ECS is likely at the lower end of the IPCC range (53), with unlikely higher estimates of ECS (55), although there is other evidence supporting higher values of ECS. Thus, although there appears to be no single property in the current generation of CMIP6 models to which the increased range and higher values on the upper end of ECS can be attributed, cloud feedbacks and cloud-aerosol interactions in models with prognostic aerosol schemes seem to be playing an important role.


This brings us to the question of what is the most appropriate policy-relevant metric to assess the Earth system model response to an increase of CO2. This inevitably depends on the time scale of interest, with the century time scale response represented by TCR being roughly consistent over multiple generations of models. Meanwhile, ECS, representing the century and longer time scales, is constrained less, because the moderating influence of ocean heat uptake does not ultimately affect it. Thus, we return to the point made in the IPCC AR3 (20). While ECS and TCR represent different aspects of the response of the Earth system, on time scales of the next several decades, TCR could be more relevant (56), although evidence has also been presented that for model simulations for the next 50 to 100 years, ECS is actually a better predictor (57). Because the climate system will never be in equilibrium, ECS is an abstract quantity that can never be observed. As noted above, ECS has proved to have numerous computational difficulties related to time scales of response, along with model-related complications, regarding how different feedbacks are modeled in different components of the climate system. It is still an open question as to how relevant ECS is to understanding historical climate change or the transient climate system response to increasing CO2. The only partially understood relationship between ECS and TCR raises unresolved questions and research challenges regarding time scales of feedbacks in the climate system.

In addition, the decreasing range of TCR over generations of models, contrasted with the recent increase in range of ECS in the CMIP6 models, points to an interesting research question regarding why this has occurred. It likely involves processes connected to ocean heat uptake, a better quantification of which would require improved temperature observations through the full depth of the global ocean (58), as well as increased understanding of various feedbacks in the climate system. Nonlinearities in the feedbacks and pattern effects (e.g., Fig. 3) are far less severe in the transient case than close to equilibrium. Another factor that could play a role is that some models (e.g., HadGEM3-GC3.1 and ACCESS-CM2) interactively redistribute their climatological ozone concentrations during the abrupt 4xCO2 and scenario runs to retain the bulk of the ozone in the stratosphere as the tropopause height rises. This acts to reduce the ECS that would otherwise occur by approximately 10 to 15% (59).

Because the higher ECS values in some models are related to cloud feedbacks and cloud-aerosol interactions, a major research question that needs to be pursued is what is the actual nature and magnitude of cloud feedbacks in general and cloud-aerosol interactions in particular. Making progress will require enhanced observations that would provide new insights into the processes involved with cloud microphysics. Additional knowledge from new observations and improved modeling would be desired to quantify the details of how clouds interact with and are affected by aerosols, both natural and anthropogenic. This is because in calculations of ECS and TCR, there are no changes in anthropogenic aerosols, but natural aerosols in a given model’s prognostic aerosol scheme can respond to the climate changes brought about by increases of CO2. The World Climate Research Programme (WCRP) Grand Challenge on Clouds, Circulation, and Climate Sensitivity ( is attempting to do just that. The assessment of climate sensitivity undertaken by the WCRP (60) delivers a new range of ECS based on multiple lines of evidence and provides a new framework for model assessment. It will be critically important that all models are tested against this rigorous framework using new methods, observations, and process evaluation. If the high-ECS models fall outside of this new range, then the framework will enable process-based understanding of the reasons. Note also, however, that some models may fall within the ECS-assessed range but could do so through a cancelation of biases. It will be equally important to identify these and work to improve them if the modeling community is to be confident that models, whether inside or outside the newly assessed range of ECS, are really delivering reliable projections of future climate. In any case, ECS methods should be standardized through a more comprehensive comparison of ECS through both multimillennial climate perturbation simulations such as those conducted in LongRunMIP (42). In addition, slab ocean model comparisons would be useful to better understand the causes of these differences and to derive a more robust estimate of climate sensitivity from the current generation of models.

Correction (12 November 2020): The authors mistakenly cited the wrong article as reference 28 in an earlier version. The correct reference, “The Community Earth System Model Version 2 (CESM2)” by G. Danabasoglu et al., has been added to the PDF and HTML (full text).

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We acknowledge the contributions from the CMIP6 modeling groups, including specific inputs from M. Bensen, O. Boucher, X. Chen, J. Dunne, A. Gettelman, C. Golaz, S. Gualdi, T. Hajima, J. Jungclaus, Y. Lim, S. Marsland, S. Picknell, M. Ringer, M. Schulz, T. Semler, E. Volodin, M. Watanabe, K. Wyser, and S. Yukimoto. We also acknowledge discussions with L. Donner and comments from R. Knutti and two anonymous reviewers. Portions of this study were supported by the Helmholtz Society project Advanced Earth System Model Evaluation for CMIP (EVal4CMIP). The computational resources of the Deutsches Klima RechenZentrum (DKRZ, Hamburg) and the integration of the ESMValTool into the ESGF infrastructure at DKRZ were essential for calculating the CMIP5 and CMIP6 ECS and TCR values here and are acknowledged. Funding: Portions of this study also were supported by the Regional and Global Model Analysis (RGMA) component of the Earth and Environmental System Modeling Program of the U.S. Department of Energy’s Office of Biological and Environmental Research (BER) via NSF IA 1844590, and also under contract DE-AC52-07NA27344 to the Lawrence Livermore National Laboratory. This work was also supported by the National Center for Atmospheric Research, which is a major facility sponsored by the NSF under cooperative agreement no. 1852977. Author contributions: G.A.M., C.A.S., V.E., G.F., J.-F.L., R.J.S., K.E.T., and M.S. designed the study, contributed ideas to the analysis, and wrote the manuscript. M.S. calculated the ECS and TCR values with ESMValTool v2.0 and created the figures. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are available on the Earth System Grid Federation through The ECS and TCR values have been calculated with ESMValTool v2.0 (

Stay Connected to Science Advances

Navigate This Article