Research ArticleCORONAVIRUS

Air pollution and COVID-19 mortality in the United States: Strengths and limitations of an ecological regression analysis

See allHide authors and affiliations

Science Advances  04 Nov 2020:
Vol. 6, no. 45, eabd4049
DOI: 10.1126/sciadv.abd4049

Abstract

Assessing whether long-term exposure to air pollution increases the severity of COVID-19 health outcomes, including death, is an important public health objective. Limitations in COVID-19 data availability and quality remain obstacles to conducting conclusive studies on this topic. At present, publicly available COVID-19 outcome data for representative populations are available only as area-level counts. Therefore, studies of long-term exposure to air pollution and COVID-19 outcomes using these data must use an ecological regression analysis, which precludes controlling for individual-level COVID-19 risk factors. We describe these challenges in the context of one of the first preliminary investigations of this question in the United States, where we found that higher historical PM2.5 exposures are positively associated with higher county-level COVID-19 mortality rates after accounting for many area-level confounders. Motivated by this study, we lay the groundwork for future research on this important topic, describe the challenges, and outline promising directions and opportunities.

INTRODUCTION

The suddenness and global scope of the coronavirus disease 2019 (COVID-19) pandemic have raised urgent questions that require coordinated investigation to slow the disease’s devastation. A critically important public health objective is to identify key modifiable environmental factors that may contribute to the severity of health outcomes [e.g., intensive care unit (ICU) hospitalization and death] among individuals with COVID-19. Numerous scientific studies reviewed by the U.S. Environmental Protection Agency (EPA) have linked fine particles (PM2.5; particles with diameter, ≤ 2.5 μm) to a variety of adverse health events (1) including death (2). It has been hypothesized that because long-term exposure to PM2.5 adversely affects the respiratory and cardiovascular systems and increases mortality risk (35), it may also exacerbate the severity of COVID-19 symptoms and worsen the prognosis of this disease (6).

Epidemiological studies to estimate the association between long-term exposure to air pollution and COVID-19 hospitalization and death is a rapidly expanding area of research that is attracting attention around the world. Two studies have been published using data from European countries (7, 8), and many more are available as preprints. However, because of the unprecedented nature of the pandemic, researchers face serious challenges when conducting these studies. One key challenge is that, to our knowledge, individual-level data on COVID-19 health outcomes for large, representative populations are not publicly available or accessible to the scientific community. Therefore, the only way to generate preliminary evidence on the link between PM2.5 and COVID-19 severity and outcomes using these aggregate data is to use an ecological regression analysis. With this study design, publicly available area-level COVID-19 mortality rates are regressed against area-level air pollution concentrations while accounting for area-level potential confounding factors. Here, we discuss the strengths and limitations of conducting ecological regression analyses of air pollution and COVID-19 health outcomes and describe additional challenges related to evolving data quality, statistical modeling, and control of measured and unmeasured confounding, paving the way for future research on this topic. We discuss these challenges and illustrate them in the context of a specific study, in which we investigated the impact of long-term PM2.5 exposure on COVID-19 mortality rates in 3089 counties in the United States, covering 98% of the population.

Illustration of an ecological regression analysis of historical exposure to PM2.5 and COVID-19 mortality rate

We begin by describing how to conduct an ecological regression analysis in this setting. COVID-19 death counts (a total of 116,747 deaths) were obtained from the Johns Hopkins University Coronavirus Resource Center and were cumulative up to 18 June 2020. We used data from 3089 counties, of which 1244 (40.3%) had reported zero COVID-19 deaths at the time of our analysis. Daily PM2.5 concentrations were estimated across the United States on a 0.01° × 0.01° grid for the period 2000–2016 using well-validated atmospheric chemistry and machine learning models (9). We used zonal statistics to aggregate PM2.5 concentration estimates to the county level and then averaged across the period 2000–2016 to perform health outcome analyses. Figure 1 illustrates the spatial variation in 2000–2016 average (hereafter referred to as “long-term average”) PM2.5 concentrations and COVID-19 mortality rates (per 1 million population) by county.

Fig. 1 National maps of historical PM2.5 concentrations and COVID-19 deaths.

Maps show (A) county-level 17-year long-term average of PM2.5 concentrations (2000–2016) in the United States in μg/m3 and (B) county-level number of COVID-19 deaths per 1 million population in the United States up to and including 18 June 2020.

We fit a negative binomial mixed model using COVID-19 mortality rates as the outcome and long-term average PM2.5 as the exposure of interest, adjusting for 20 county-level covariates. We conducted more than 80 sensitivity analyses to assess the robustness of the findings to various modeling assumptions. We found that an increase of 1 μg/m3 in the long-term average PM2.5 is associated with a statistically significant 11% (95% CI, 6 to 17%) increase in the county’s COVID-19 mortality rate (see Table 1); this association continues to be stable as more data accumulate (fig. S3). We also found that population density, days since the first COVID-19 case was reported, median household income, percent of owner-occupied housing, percent of the adult population with less than high school education, age distribution, and percent of Black residents are important predictors of the COVID-19 mortality rate in the model. We found a 49% (95% CI, 38 and 61%) increase in COVID-19 mortality rate associated with a 1-SD (per 14.1%) increase in percent Black residents of the county. Details on the data sources, statistical methods, and analyses are summarized in the Supplementary Materials. All data sources used in the analyses, along with fully reproducible code, are publicly available at https://github.com/wxwx1993/PM_COVID.

Table 1 Mortality rate ratios (MRR), 95% confidence intervals (CI), and P values for all variables in the main analysis.

Details of the statistical models are available in section S2. Q, quintile.

View this table:

Strengths and limitations of an ecological regression analysis

Ecological regression analysis provides a simple and cost-effective approach for studying potential associations between historical exposure to air pollution and increased vulnerability to COVID-19 in large representative populations, as illustrated in our study in the previous section. This approach is regularly applied in many areas of research (10). Using our study as an example, we summarize in Table 2 the strengths, limitations, and opportunities considering (i) study design, (ii) COVID-19 health outcome data, (iii) historical exposure to air pollution, and (iv) measured and unmeasured confounders, with the goal of paving the way for future research.

Table 2 Strengths and limitations of ecological regression analyses applied to research on air pollution and COVID-19 and opportunities for future research.

View this table:

Among the key limitations, by design, ecological regression analyses are unable to adjust for individual-level risk factors (e.g., age, race, and smoking status); when individual-level data are unavailable, this approach leaves us unable to make conclusions regarding individual-level associations. In the context of COVID-19 health outcomes, this is a severe limitation, as individual-level risk factors are known to affect COVID-19 health outcomes. It is important to note that confusion between ecological associations and individual associations may present an ecological fallacy. In extreme cases, this fallacy can lead to associations detected in ecological regression that do not exist or are in the opposite direction of true associations at the individual level. However, ecological regression analyses still allow us to make conclusions at the area level, which can be useful for policy-making (11). For the association between COVID-19 health outcomes and PM2.5 exposure, we argue that area-level conclusions are valuable, as they can inform important immediate policy actions that will benefit public health, such as (i) prioritization of precautionary measures [e.g., personal protective equipment (PPE) allocations and hospital beds] to areas with historical higher air pollution and (ii) further strengthening the scientific argument for lowering the U.S. National Ambient Air Quality Standards for PM2.5 and other pollutants. To completely avoid potential ecological bias, a representative sample of individual-level data is necessary. While this may not be feasible in the near future, as some COVID-19 outcome data become available at the individual level, existing approaches that augment county-level data with individual-level data (12) could be used to correct for ecological bias.

Furthermore, air pollution exposure misclassification, due to between-area mobility and within-area variation, is another potential source of bias that could affect the ecological regression results described in our example study. Methods to account for the propagation of exposure error into the ecological regression model (13) could be applied to help mitigate the impact of measurement error. Outcome misclassification is another limitation that can be partially overcome by accessing nationwide registry data with the validated cause of death (14). As in all observational studies, adjustment for measured and unmeasured confounding presents another key challenge in ecological regression analyses, which may be exacerbated when dealing with dynamic pandemic data, as in our study. Conducting studies using both traditional regressions and methods for causal inference as in Wu et al. (2) is necessary to assess the robustness of the findings.

Increasing the scientific rigor of research in this area requires access to representative, individual-level data on COVID-19 health outcomes, including information about patients’ residential address, demographics, and individual-level confounders. This is an enormous challenge that will require consideration of many privacy, legal, and ethical trade-offs (14). Future areas of research also include the application of statistical methods to quantify and correct for ecological bias and measurement error, reproducible methods for causal inference, and sensitivity analysis of measured and unmeasured confounding bias as suggested above. These strengths and limitations are illustrated further in the context of our own study (see the Supplementary Materials).

DISCUSSION

Ecological regression analyses are crucial to stimulate innovations in a rapidly evolving area of research. Ongoing research has already focused on overcoming some aspects of these limitations (8, 15). For example, ecological regression analysis of air pollution and COVID-19, using data with finer geographic resolution, is being conducted for different countries and regions around the world. Cole et al. (8) published an ecological regression analysis using data in Dutch municipalities and found results consistent with our own investigation; the California Air Resources Board (CARB) is planning to conduct a similar study at the census tract level (15). Although an ecological regression analysis cannot provide insight into the mechanisms underlying the relationship between PM2.5 exposure and COVID-19 mortality, studies are starting to shed light on the potential biological mechanisms that may explain the relationship between air pollution and viral infection outcomes (16). For example, it has been hypothesized that chronic exposure to PM2.5 causes alveolar angiotensin-converting enzyme 2 (ACE-2) receptor overexpression and impairs host defenses (17). This could cause a more severe form of COVID-19 in ACE-2–depleted lungs, increasing the likelihood of poor outcomes, including death (18).

The associations detected in ecological regression analyses provide strong justification for follow-up investigations as more and higher-quality COVID-19 data become available. Such studies would include validation of our findings with other data sources and study types, as well as investigations into mediating factors and effect modifiers, biological mechanisms, impacts of PM2.5 exposure timing, and relationships between PM2.5 and other COVID-19 outcomes such as hospitalization. Research on how modifiable factors may exacerbate COVID-19 symptoms and increase mortality risk is essential to guide policies and behaviors to minimize fatality related to the pandemic. Such research could also provide a strong scientific argument for revision of the U.S. Ambient Air Quality Standards for PM2.5 and other environmental policies in the midst of a pandemic.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/45/eabd4049/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: The computations in this paper were run on (i) the Odyssey cluster supported by the FAS Division of Science, Research Computing Group at Harvard University and (ii) the Research Computing Environment supported by Institute for Quantitative Social Science in the Faculty of Arts and Sciences at Harvard University. We gratefully acknowledge support from the 2020 Star-Friedman Challenge for Promising Scientific Research, the Climate Change Solutions Fund at Harvard University, and the Fernholz Foundation. We would like to thank L. Goodwin and S. Tobin for editorial assistance in the preparation of this manuscript. Funding: This work was made possible by support from NIH grants R01 ES024332-01A1, P50MD010428, RO1 ES026217, RO1 ES028033, R01 ES030616, R01 AG066793-01, and R01 MD012769; Health Effects Institute grant (HEI) 4953-RFA14-3/16-4: and US EPA grant 83587201-0. The funding sources did not participate in the design or conduct of the study; collection, management, analysis, or interpretation of the data; or preparation, review, or approval of the manuscript. The research described in this article was conducted under contract to the HEI, an organization jointly funded by the EPA (Assistance Award No. R-83467701), and certain motor vehicle and engine manufacturers. The contents of this article do not necessarily reflect the views of HEI or its sponsors, nor do they necessarily reflect the views and policies of the EPA or motor vehicle and engine manufacturers. Author contributions: X.W. and R.C.N. contributed equally to the paper. X.W. and R.C.N. contributed to formulation of the idea, data preparation, data analysis, data interpretation, and writing of the manuscript. M.B.S. and D.B. contributed to data preparation, data interpretation, and review of the manuscript. F.D. contributed to formulation of the idea, study design, data interpretation, funding, and writing of the manuscript. All authors contributed to the interpretation of the results and critical revision of the manuscript for important intellectual content and approved the final version of the manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. F.D. is the guarantor. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Data and code are publicly available at https://github.com/wxwx1993/PM_COVID. Additional data related to this paper may be requested from the authors.
View Abstract

Stay Connected to Science Advances

Navigate This Article