## Abstract

Many studies link long-term fine particle (PM_{2.5}) exposure to mortality, even at levels below current U.S. air quality standards (12 micrograms per cubic meter). These findings have been disputed with claims that the use of traditional statistical approaches does not guarantee causality. Leveraging 16 years of data—68.5 million Medicare enrollees—we provide strong evidence of the causal link between long-term PM_{2.5} exposure and mortality under a set of causal inference assumptions. Using five distinct approaches, we found that a decrease in PM_{2.5} (by 10 micrograms per cubic meter) leads to a statistically significant 6 to 7% decrease in mortality risk. Based on these models, lowering the air quality standard to 10 micrograms per cubic meter would save 143,257 lives (95% confidence interval, 115,581 to 170,645) in one decade. Our study provides the most comprehensive evidence to date of the link between long-term PM_{2.5} exposure and mortality, even at levels below current standards.

## INTRODUCTION

The Clean Air Act requires the U.S. Environmental Protection Agency (EPA) to set National Ambient Air Quality Standards (NAAQS). In 1971, EPA set the first NAAQS to best protect public health and welfare. NAAQS are periodically reviewed/revised on the basis of scientific evidence, resulting in a steady decrease in fine particle (PM_{2.5}; particles with diameter, ≤2.5 μm) concentrations. The association between long-term exposure to PM_{2.5} and mortality is well documented (*1*–*4*). Recent studies have examined effect estimates among people who are exposed to PM_{2.5} concentrations below the current NAAQS (*3*, *5*). These studies found that exposure to PM_{2.5} below the U.S. standard is associated with an increased mortality risk.

Some scientists, including the current chair of EPA’s Clean Air Scientific Advisory Committee, have argued against including studies that use traditional statistical approaches to informing revisions of the NAAQS and propose focusing on studies that apply causal inference approaches (*6*). Their main criticism is that traditional approaches that include potential confounders as covariates in the regression model do not inform causality. Recently, Goldman and Dominici (*7*) argued that although “causal inference approaches tend to be more robust to violation of assumptions, […] air pollution regulations must be based on existing evidence and demonstrated inference methods that arise from review of existing literature.”

Our study bridges this divide. Using the largest air pollution study cohort to date (nationwide Medicare enrollees 2000–2016), we provide strong evidence of the causal link between long-term PM_{2.5} exposure and mortality under a set of assumptions necessary for causal inference (described in detail in Materials and Methods). We applied traditional and causal inference approaches to the same data, assessed sensitivity to modeling assumptions and study period, and made statistical software available.

## RESULTS

We obtained open cohort data for more than 68.5 million Medicare enrollees (65 years of age or older) from 2000 to 2016 (*8*), including demographic information on age, sex, race/ethnicity, date of death, and residential zip code. Each person was tracked through a unique patient ID. Annual PM_{2.5} exposure estimates were obtained from a previously developed and validated ensemble prediction model (Fig. 1) (*9*). The model ensembles three machine learning algorithms to predict daily PM_{2.5} at 1-km^{2} grid resolution across the United States using remote sensing, output from chemical transport models, meteorological and land-use variables, and other predictors. As residential addresses for Medicare enrollees are available at the zip code level, we used zonal statistics to aggregate exposure estimates to the zip code level to perform the health analyses. To adjust for confounding, we considered 10 zip code– and county-level confounders including zip code–level socioeconomic status (SES) indicators from the 2000 and 2010 Census and the 2005–2012 American Community Surveys (ACS) and county-level information from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System (BRFSS). We considered the following zip code–level meteorological variables: summer (June to September) and winter (December to February) average of (i) maximum daily temperatures and (ii) relative humidity in each zip code obtained from gridMET via Google Earth Engine. Materials and Methods and table S1 include details on the variables included in the analyses and the R code for the implementation of the methods (section S4). All study data sources are publicly available.

We implemented five statistical approaches to estimating the effect of PM_{2.5} exposure on mortality, accounting for potential confounders. The two traditional approaches rely on regression modeling for confounding adjustment: (i) Cox proportional hazards model and (ii) Poisson regression. We created multiple observations for each individual, each observation representing a person year of follow-up. We fit Cox hazards models, using follow-up year as the time metric and annual PM_{2.5} as the time-varying exposure, stratifying by age (5-year categories), sex, race/ethnicity, and Medicaid eligibility (a surrogate for individual-level SES). We fit Poisson models for aggregated outcomes (i.e., counts of death) by zip code and year, stratified by the same individual-level characteristics and follow-up year. We adjusted both for confounding by including 10 zip code– or county-level risk factors, four zip code–level meteorological variables, and indicators for geographic region (Northeast, South, Midwest, and West). To account for long-term time trends, we included calendar year as a categorical variable.

We also considered three approaches for causal inference that rely on the potential outcomes framework and generalized propensity scores (GPSs). These approaches adjust for confounding using (i) matching by GPS, (ii) weighting by GPS, and (iii) adjustment by GPS, by including GPS as a covariate in the health outcome model. The GPSs were estimated by regressing the PM_{2.5} exposure on confounders using a gradient boosting machine, including the same confounders as in the traditional approaches. Following GPS implementation, we fit a Poisson regression model stratified by individual-level characteristics and follow-up year, as if it was a stratified randomized experiment. All approaches are described in detail in Materials and Methods and in section S1.

All five approaches were fit on the 2000–2016 data. The 2000–2016 cohort consisted of 68,503,979 individuals (573,370,257 person years); we observed 27,106,639 deaths (39.6%; Table 1). Figure 1 shows the average PM_{2.5} concentrations in 2000 and 2016. To estimate low-level PM_{2.5} effects on mortality, we applied the five statistical approaches, restricting analyses to the subpopulation of Medicare enrollees who were always exposed to PM_{2.5} levels lower than 12 μg/m^{3} over the entire study period. Additional analysis was conducted on the previously used 2000–2012 cohort. To evaluate the model sensitivity to potential unmeasured confounders that vary over time, all five approaches were fit twice, once with year as a covariate (the main analysis) and once without (as a sensitivity analysis). Results for all additional analyses are shown in section S3. Effect estimates are presented as hazard ratios (HRs) per increase (10 μg/m^{3}) in annual PM_{2.5}. Ninety-five percent confidence intervals (CIs) for all models were evaluated by m-out-n blocked bootstrap to account for spatial correlation. More specifically, we recalculated the GPS and refit the outcome model in each bootstrapped sample to ensure that the bootstrapping procedure jointly accounted for the variability associated both with the GPS estimation and with the outcome model (see section S1).

The causal inference framework lends itself to the evaluation of covariate balance for measured confounders. The covariate balance indicates the quality of the causal inference approach at recovering randomized experiments and informs the degree to which we can make a valid causal assessment. A detailed discussion is provided in section S1. Covariate balance was evaluated using mean absolute correlation (AC), with values of <0.1 indicating high quality in recovering randomized experiments. Figure 2 shows that AC is smaller than 0.1 using causal inference GPS methods (matching and weighting), thus strengthening the interpretability and validity of our analyses as providing evidence of causality. Additional evaluation of covariate balance estimating the standardized mean difference after dichotomizing the continuous exposure is provided in fig. S3.

For the period 2000–2016, we found that all statistical approaches provide consistent results: A decrease (10 μg/m^{3}) in PM_{2.5} led to a statistically significant decrease in mortality rate ranging between 6 and 7% (= 1 − 1/HR) [HR estimates, 1.06 (95% CI, 1.05 to 1.08) to 1.08 (95% CI, 1.07 to 1.09)]. The estimated HRs were larger when studying the cohort of Medicare enrollees that were always exposed to PM_{2.5} levels lower than 12 μg/m^{3} [1.23 (95% CI, 1.18 to 1.28) to 1.37 (95% CI, 1.34 to 1.40)] (Fig. 3; corresponding numbers are shown in table S3). Similar results were found for the period 2000–2012 (table S3), also showing consistency with Di *et al.* (*3*). We reanalyzed all data excluding year as a covariate. The estimated HRs were larger in magnitude, potentially indicating residual confounding bias by some unmeasured confounders with time trends that covary with time trends in the outcome and exposure (table S3). We conducted further sensitivity analyses to unmeasured confounding by calculating the E-value (*10*, *11*). The results, shown in table S5, suggest that our conclusions are overall robust to unmeasured confounding bias.

We estimated the total number of deaths avoided among elderly in a decade if, hypothetically, the U.S. standards followed the World Health Organization (WHO) annual guideline of 10 μg/m^{3} and all zip codes complied. For this calculation, we used the most conservative HR estimate across all statistical approaches [HR, 1.06 (95% CI, 1.05 to 1.08) and 1.23 (95% CI, 1.18 to 1.28)]. We found that lowering the standards to 10 μg/m^{3} would have saved 143,257 lives (95% CI, 115,581 to 170,645) in one decade.

## DISCUSSION

This study provides the most robust and reproducible evidence to date on the causal link between exposure to PM_{2.5}, even at levels below 12 μg/m^{3}, and mortality among Medicare enrollees. Considering (i) the massive study population, (ii) the numerous sensitivity analyses, and (iii) the transparent assessment of covariate balance that indicates the quality of causal inference for recovering randomized experiments, we conclude that long-term PM_{2.5} exposure is causally related to mortality. This conclusion assumes that the causal inference assumptions hold and, more specifically, that we were able to adequately account for confounding bias. We explored various modeling approaches and conducted extensive sensitivity analyses and found that the results were robust across approaches and models. This work relies on publicly available data, and we provide code that allows for reproducibility of our analyses.

Both traditional and causal inference approaches rely on assumptions. Unless all assumptions are satisfied, regardless of approach, recovery of causal effects is not guaranteed. A critical assumption that guarantees our conclusion’s validity is that our statistical analyses account for all confounders. This assumption must always be made in observational studies. We included several publicly available individual- and area-level potential confounders. To mitigate unmeasured confounding bias, we assessed the results’ sensitivity by including year as a surrogate for some unmeasured confounders that might have covaried over time with PM_{2.5} and mortality and, thus, confound their association. Even after adjustment for year, the analysis could be affected by confounding bias by unmeasured factors; therefore, we conducted further sensitivity analyses to unmeasured confounding by calculating the E-value and showed that our results are robust to unmeasured confounding bias.

Dominici and Zigler (*12*) previously discussed three notions of what constitutes evidence of causality in air pollution epidemiology. The first is causality inferred from evidence of biological plausibility (*13*). The second is consistency of results across many epidemiological studies and adherence to Bradford Hill causal criteria (*14*). The third is the use of causal inference methods that are more robust to model misspecification compared to traditional approaches and, when assumptions are met, can isolate causal relationships. More specifically, the causal inference approaches considered in this work require the estimation of GPS as the first step. Assuming all causal inference assumptions hold, these approaches are more robust to outcome model misspecification and allow for the transparent evaluation of covariate balance. However, note that if the models are accurately specified and all assumptions are met, then the traditional approaches have the potential to inform causal relationship as well. In particular, we found that a more flexible regression model specification may help adequately adjust for confounding; when implementing these flexible models, we observed similar results compared to the causal inference approaches.

This work estimates the causal relationship using causal inference methods, addressing just one of Dominici and Zigler’s three notions of what constitutes scientific evidence of causality. The collective evidence across studies conducted in different populations, using different study designs and methods, is also imperative to inform regulatory action. A recent meta-analysis found robust evidence for an effect on mortality across 52 cohort studies at PM_{2.5} levels below 10 μg/m^{3} (*15*).

Exposure to PM_{2.5} was estimated from a prediction model, which, while very good, is not perfect. The PM_{2.5} exposure prediction model developed by Di *et al.* (*9*) that was used in this analysis indicated excellent model performance, with a 10-fold cross-validated *R*^{2} of 0.89 for annual PM_{2.5} predictions. However, exposure error could have affected all HR estimates. In the original study by Di *et al.* (*3*), the authors assessed the robustness of the results to the exposure predictions by repeating the analysis based on PM_{2.5} exposure data obtained from 1928 EPA ambient monitors. The additional analysis was restricted to the subpopulation of individuals within 50 km of these monitors. While this subset does not represent the entire population, we found that the analysis based on nearest monitoring site led to an HR estimate that was only slightly lower than the one obtained using the exposure prediction model [i.e., 1.061 (95% CI, 1.059 to 1.063) versus 1.073 (95% CI, 1.071 to 1.075)]. Although these results are reassuring, we recognize that they are not a substitute to a formal analysis that accounts for exposure error.

Accounting for exposure measurement error under a causal inference framework using propensity scores is complex, as the exposure error will affect not only the estimation of the health effects but also the estimation of the propensity score and its implementation to adjust for measured confounders (*16*). Regression calibration is a common method for measurement error correction (*17*). Wu *et al.* (*18*) proposed a regression calibration approach for GPS analysis under categorical exposures. The proposed approach was applied in the context of long-term PM_{2.5} exposure and mortality using the Medicare data in the Northeastern United States. When accounting for exposure error, there was a higher and still statistically significant association between exposure to PM_{2.5} and mortality, although with larger CIs. How to propagate exposure error under a causal inference framework for a continuous exposure is still an area of active research; the presence of exposure measurement error could induce a bias toward the null in all of our estimates (*19*).

The model parameterization assumes that zip code–specific information is spatially independent, given covariates. Since we adjusted for numerous zip code–level predictors of mortality, including SES and meteorological variables, this assumption is likely to hold. If any residual spatial dependence remains under certain assumptions (e.g., those used in generalized estimating equations), then it would not have affected our point estimates but could have influenced the estimated SEs. However, our bootstrapping procedure partially accounts for this possibility. By randomly sampling zip codes for each bootstrap replicate, we were able to break down spatial dependence given covariates. Therefore, it is unlikely that our results are affected by spatial correlation.

Our study is based on publicly available data sources and have made all code developed for our analyses publicly available. Our approach maximizes reproducibility and transparency. We provide robust evidence that the current U.S. standards for PM_{2.5} concentrations are not protective enough and should be lowered to ensure that vulnerable populations, such as the elderly, are safe.

Our results raise awareness of the continued importance of assessing the impact of air pollution exposure on mortality. There are currently numerous disputes regarding the evidence from previous air pollution epidemiologic studies, with arguments made for only using causal inference methods or only including studies that make participants’ information publicly available. We oppose these very strongly. Most epidemiological studies must rely on confidential patient data to provide evidence on adverse health effects of environmental exposures on outcomes and also focus on populations that cannot be studied using administrative data. We hope this work will help researchers and policy makers, particularly as revision discussions of national PM_{2.5} standards are underway.

## MATERIALS AND METHODS

### Study population

Our study population is composed of more than 68.5 million Medicare enrollees (≥65 years of age) between 2000 and 2016. Medicare claims data, obtained from the Centers for Medicare and Medicaid Services (*8*), are an open cohort, including demographic information such as age, sex, race/ethnicity, date of death, and residential zip code. A unique patient ID is assigned to each person to allow for tracking over time. Medicare enrollees entered our cohort in 2000 if enrolled before 2000, or upon their enrollment after 2000. After enrollment, each individual was followed annually until the year of their death or the end of our study period (31 December 2016). This study was conducted under a protocol approved by the Harvard T.H. Chan School of Public Health Human Subjects Committee.

### Exposure assessment

We estimated daily PM_{2.5} levels at a high spatiotemporal resolution using a 1-km^{2} grid network across the contiguous United States and a well-validated ensemble-based prediction model (*9*). This model used ensemble learning approaches to combining three machine learning models: a random forest regression, a gradient boosting machine, and an artificial neural network (*20*–*22*). These machine learning algorithms used more than 100 predictor variables from satellite data, land-use information, weather variables, and output from chemical transport model simulations. The ensemble-based model was trained on daily PM_{2.5} concentrations measured at 2156 U.S. EPA monitoring sites, with an average cross-validated *R*^{2} of 0.86 for daily PM_{2.5} predictions and 0.89 for annual predictions, indicating excellent performance that was improved compared to previously developed models (*23*, *24*).

Residential addresses are not available for Medicare enrollees, only residential zip codes. For each standard zip code, we used zonal statistics to calculate the daily average PM_{2.5} concentration based on all 1-km^{2} grid cell predictions within the zip code via aggregations. More specifically, we first overlaid the zip code boundaries to the 1-km^{2} grid cells and then averaged the predictions at 1-km^{2} grid cells whose centroids fall within the boundary of that zip code (*25*). For P.O. Box–only zip codes, the average PM_{2.5} concentrations were calculated by linking to the predictions from the nearest 1-km^{2} grid cell. Annual zip code averages were estimated by averaging the daily concentrations. We assigned the annual estimated zip code average PM_{2.5} concentration to individuals who resided in that zip code for each calendar year.

### Potential confounders

The stratification by individual-level characteristics also adjusted for potential confounding by these variables. Furthermore, to adjust for confounding bias by community-level factors, we used information about multiple zip code–level SES variables collected from the U.S. Census, ACS, and BRFSS. All data [including census data, which are reported for Zip Code Tabulation Area (ZCTA)] were mapped to postal zip codes. Specifically, we included (i) two county-level variables: average body mass index and smoking rate; (ii) eight zip code–level census variables: proportion of Hispanic residents, proportion of Black residents, median household income, median home value, proportion of residents in poverty, proportion of residents with a high school diploma, population density, and proportion of residents that own their house; and (iii) four zip code–level meteorological variables: the summer (June to September) and winter (December to February) averages of maximum daily temperatures and relative humidity. We obtained zip code–level meteorological variables using area-weighted aggregations based on daily temperature and humidity data on 4-km^{2} gridded rasters from Gridmet via Google Earth Engine (*26*, *27*). We also considered two indicator variables indicating (i) the four census geographic regions of the United States (Northeast, South, Midwest, and West) and (ii) calendar years (2000–2016) to adjust for some residual or unmeasured spatial and temporal confounding, respectively. The data used for this study are publicly available and sources are listed in table S1.

### Data linkage

Outcome data were available at the postal zip code level, at which we also assigned annual PM_{2.5} exposures. Outcome and exposure information were available for 35,924 zip codes. We then mapped potential confounders at ZCTA to postal zip codes to link the outcome and exposure data to potential confounders obtained from the U.S. Census, ACS, and the BRFSS. The total number of zip codes included in our main analysis with information about all outcome, exposure, and confounder data was 31,337. See section S2 for more details.

### Statistical analysis

For all models, we performed stratified outcome model analysis by four individual-level characteristics: (i) a 5-year category of age at entry (65 to 69, 70 to 74, 75 to 79, 80 to 84, 85 to 89, 90 to 94, 95 to 99, and above 100 years of age), (ii) race/ethnicity (White, Black, Asian, Hispanic, Native American, and other), (iii) sex (male or female), and (iv) an indicator variable for Medicaid eligibility (a surrogate for individual-level SES).

We fit five different statistical models to estimate the causal relationship between long-term PM_{2.5} exposure and our outcome of interest, all-cause mortality among the elderly. Below, we describe the five approaches; details on the statistical methods and assumptions are provided in section S1.

*Cox proportional hazard approach*. We fit stratified Cox proportional hazards models using annual PM_{2.5} as the time-varying exposure and stratifying by four individual-level characteristics. In our main analysis, we adjusted for 14 zip code– or county-level time-varying covariates, as well as a dummy region variable and dummy calendar year variable. The Cox proportional hazards survival model is specified as Survival(follow-up year, death) ~ PM_{2.5} + area-level risk factors + meteorological variables + dummy year + dummy region + strata(age, race, gender, Medicaid eligibility).

*Poisson regression approach*. We fit the Poisson regression model, using annual PM_{2.5} as the time-varying exposure; the count of deaths at the given follow-up year, calendar year, and zip code as the outcome and the corresponding total person-time as the offset term. To adjust for potential confounding, we included the same 14 zip code– or county-level time-varying covariates, as well as a dummy region variable and dummy calendar year variable, as those included in the Cox proportional hazards models. We used a stratified Poisson regression model formulation to account for the strata-specific baseline risk rates by stratifying on individual-level characteristics. The Poisson regression model is specified as log(Ε[death counts]) ~ PM_{2.5} + area-level risk factors + meteorological variables + dummy year + dummy region + strata(age, race, gender, Medicaid eligibility, follow-up year) + offset(log[person year]).

*Causal inference approaches*.

##### GPS estimation

The three proposed causal inference approaches required the estimation of GPS as the first step. In our study, we modeled the conditional density of exposure (i.e., annual average PM_{2.5}) on the 14 zip code– or county-level time-varying covariates, as well as a dummy region variable and dummy calendar year variable, using gradient boosting machine with normal residuals (*28*, *29*). The gradient boosting machine model is specified as PM_{2.5} ~ area-level risk factors + meteorological variables + dummy year + dummy region + ε, where ε ~ *N*(0, σ^{2}).

##### GPS matching approach

We constructed the matched pseudo-population as described in section S1. We first checked the covariate balance in the matched pseudo-population, and if covariate balance was achieved (average AC, <0.1), then we fit a univariate Poisson regression model regressing the death counts with an offset person-time term, on the exposure PM_{2.5}, stratifying by four individual-level characteristics and the same follow-up year. The Poisson regression model is specified as log(*E*[death counts]) ~ PM_{2.5} + strata(age, race, gender, Medicaid eligibility, follow-up year) + offset(log[person year]), on the matched pseudo-population. Additional details on the matching procedure can be found in section S1.

##### GPS weighting approach

We constructed the weighted pseudo-population as described in section S1. We first checked the covariate balance on the weighted pseudo-population, and if covariate balance was achieved (average AC, <0.1), then we fit a weighted univariate Poisson regression model regressing the death count with offset term the person-time on PM_{2.5} exposure incorporating the assigned weights and stratifying by the four individual-level characteristics and the same follow-up year. The Poisson regression model is specified as log(*E*[death counts]) ~ PM_{2.5} + strata(age, race, gender, Medicaid eligibility, follow-up year) + offset(log[person year]), weights = *f*(PM_{2.5})/GPS, where *f*(PM_{2.5}) is the marginal density function of exposure PM_{2.5}, which serves as a stabilizing term (*30*).

##### GPS adjustment approach

We modeled the conditional expectation of the death counts given the exposure and the estimated GPS as a stratified Poisson regression with flexible formulation of bivariate variables, with the corresponding person-time offset. The Poisson regression model was specified as log(*E*[death counts]) ~ PM_{2.5} + PM_{2.5} × GPS + GPS + GPS^{2} + strata(age, race, gender, Medicaid eligibility, follow-up year) + offset(log[person year]). In contrast to the GPS matching/weighting approaches, where the analysis is complete after fitting the Poisson regression model, for the GPS adjustment approach, the coefficients from the Poisson regression model do not provide any causal interpretation; instead, the causal outcome analysis is conducted on the counterfactuals predicted by the Poisson model (*31*). We fit a univariate linear regression model regressing the counterfactual mean hazard rates for each PM_{2.5} level, stratifying by four individual-level characteristics and the same follow-up year. The outcome linear regression model is specified as *E*(hazard rates) ~ PM_{2.5} + strata(age, race, gender, Medicaid eligibility, follow-up year). Additional details are provided in section S1.

*Total events avoided*. We estimated the total number of deaths that would be avoided among the elderly per decade if all areas were in compliance with the current WHO guidelines (annual PM_{2.5} exposure, ≤10 μg/m^{3}) (*32*). Nethery *et al*. (*33*) defined and identified a causal quantity named the total events avoided (TEA) under causal assumptions 1 to 3 (see section S1), defined as the difference in the expected number of health events under the counter-factual pollution exposures and the observed number of health events under the factual pollution exposures. Such a causal quantity is particularly related to the health policy that intends to answer the question “How many deaths were avoided in the Medicare population per decade due to the U.S. National Ambient Air Quality Standards (NAAQS) changes in particulate matter (PM_{2.5}) in the same time?”

We created the counterfactual PM_{2.5} exposures if all zip codes in the continental United States complied with the current WHO guidelines (annual PM_{2.5}, ≤10 μg/m^{3}). For zip codes that did not comply with the standard until 2016, their counterfactual was assumed to be exposure exactly at this hypothesized standard (10 μg/m^{3}). This is a conservative estimate, as it answers the question of TEA if these zip codes were exactly at 10 μg/m^{3} and not lower than this concentration. For zip codes already in compliance, we assumed that their concentration was unchanged, which otherwise would result in even higher TEA.

We compared this counterfactual scenario to the factual scenarios during the most recent decade (2007–2016). For zip codes with an annual PM_{2.5} concentration of >12 μg/m^{3}, the numbers for the TEA were obtained using the most conservative HR from our main analysis [HR, 1.06 (95% CI, 1.05 to 1.08); see table S3]. For zip codes with an annual PM_{2.5} concentration of 10 to 12 μg/m^{3}, the numbers for the TEA were obtained using the most conservative HR from our low-level analysis [HR, 1.23 (95% CI, 1.18 to 1.28)]. Zip codes with an annual PM_{2.5} concentration of <10 μg/m^{3} did not contribute to the TEA. For the CI calculation, we used the lower and upper bounds of the 95% CIs from the HR estimates (which were obtained by bootstrap).

*Evaluation of unmeasured confounding*. We conducted a sensitivity analysis to evaluate the robustness of our results to unmeasured confounding by calculating the E-value. The E-value for the point estimate of interest (in our case, the HR) can be defined as the minimal strength of an association, on the risk ratio scale, that an unmeasured confounder would need to have with both the exposure and outcome, conditional on the covariates already included in the model, to fully explain the observed association under the null. We calculated the E-value for our reported HRs per increase (10 μg/m^{3}) of long-term exposure to PM_{2.5}. The calculation of E-value can be implemented through the E-value calculator by Mathur *et al.* (*11*), available at https://www.evalue-calculator.com/.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/29/eaba5692/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We would like to thank L. Goodwin for editorial assistance in the preparation of this manuscript. The computations in this paper were run on the Odyssey cluster, supported by the FAS Division of Science, Research Computing Group at Harvard University, and the Research Computing Environment, supported by the Institute for Quantitative Social Science in the Faculty of Arts and Sciences at Harvard University.

**Funding:**This work was made possible by the support from NIH grants R01 ES024332-01A1, P50 MD010428, R21 ES024012, R01 ES026217, R01 ES028033, MD012769, R01 ES030616, and P30 ES09089; HEI grant 4953-RFA14-3/16-4; and USEPA grants 83587201-0 and RD-83479801. The contents are solely the responsibility of the grantee and do not necessarily represent the official views of the funding agencies. Further, the funding agencies do not endorse the purchase of any commercial products or services related to this publication.

**Author contributions:**X.W.: study design, analysis, and writing. D.B.: study design and writing. J.S.: data preparation. M.A.K.: study design and writing. F.D.: study design and writing.

**Competing interests:**J.S. served as an expert witness for the U.S. Department of Justice in a Clean Air Act violation case. The authors declare that they have no other competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Those interested in the original data and code can contact the corresponding author. Additional data related to this paper is available on https://github.com/wxwx1993/National_Causal.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).