Causal Inference with Double/Debiased Machine Learning for Evaluating the Health Effects of Multiple Mismeasured Pollutants
Authors:
Gang Xu,
Xin Zhou,
Molin Wang,
Boya Zhang,
Wenhao Jiang,
Francine Laden,
Helen H. Suh,
Adam A. Szpiro,
Donna Spiegelman,
Zuoheng Wang
Abstract:
One way to quantify exposure to air pollution and its constituents in epidemiologic studies is to use an individual's nearest monitor. This strategy results in potential inaccuracy in the actual personal exposure, introducing bias in estimating the health effects of air pollution and its constituents, especially when evaluating the causal effects of correlated multi-pollutant constituents measured…
▽ More
One way to quantify exposure to air pollution and its constituents in epidemiologic studies is to use an individual's nearest monitor. This strategy results in potential inaccuracy in the actual personal exposure, introducing bias in estimating the health effects of air pollution and its constituents, especially when evaluating the causal effects of correlated multi-pollutant constituents measured with correlated error. This paper addresses estimation and inference for the causal effect of one constituent in the presence of other PM2.5 constituents, accounting for measurement error and correlations. We used a linear regression calibration model, fitted with generalized estimating equations in an external validation study, and extended a double/debiased machine learning (DML) approach to correct for measurement error and estimate the effect of interest in the main study. We demonstrated that the DML estimator with regression calibration is consistent and derived its asymptotic variance. Simulations showed that the proposed estimator reduced bias and attained nominal coverage probability across most simulation settings. We applied this method to assess the causal effects of PM2.5 constituents on cognitive function in the Nurses' Health Study and identified two PM2.5 constituents, Br and Mn, that showed a negative causal effect on cognitive function after measurement error correction.
△ Less
Submitted 21 September, 2024;
originally announced October 2024.
Practical large-scale spatio-temporal modeling of particulate matter concentrations
Authors:
Christopher J. Paciorek,
Jeff D. Yanosky,
Robin C. Puett,
Francine Laden,
Helen H. Suh
Abstract:
The last two decades have seen intense scientific and regulatory interest in the health effects of particulate matter (PM). Influential epidemiological studies that characterize chronic exposure of individuals rely on monitoring data that are sparse in space and time, so they often assign the same exposure to participants in large geographic areas and across time. We estimate monthly PM during 1…
▽ More
The last two decades have seen intense scientific and regulatory interest in the health effects of particulate matter (PM). Influential epidemiological studies that characterize chronic exposure of individuals rely on monitoring data that are sparse in space and time, so they often assign the same exposure to participants in large geographic areas and across time. We estimate monthly PM during 1988--2002 in a large spatial domain for use in studying health effects in the Nurses' Health Study. We develop a conceptually simple spatio-temporal model that uses a rich set of covariates. The model is used to estimate concentrations of $PM_{10}$ for the full time period and $PM_{2.5}$ for a subset of the period. For the earlier part of the period, 1988--1998, few $PM_{2.5}$ monitors were operating, so we develop a simple extension to the model that represents $PM_{2.5}$ conditionally on $PM_{10}$ model predictions. In the epidemiological analysis, model predictions of $PM_{10}$ are more strongly associated with health effects than when using simpler approaches to estimate exposure. Our modeling approach supports the application in estimating both fine-scale and large-scale spatial heterogeneity and capturing space--time interaction through the use of monthly-varying spatial surfaces. At the same time, the model is computationally feasible, implementable with standard software, and readily understandable to the scientific audience. Despite simplifying assumptions, the model has good predictive performance and uncertainty characterization.
△ Less
Submitted 8 June, 2009;
originally announced June 2009.