-
Estimating Associations Between Cumulative Exposure and Health via Generalized Distributed Lag Non-Linear Models using Penalized Splines
Authors:
Tianyi Pan,
Hwashin Hyun Shin,
Glen McGee,
Alex Stringer
Abstract:
Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specifie…
▽ More
Quantifying associations between short-term exposure to ambient air pollution and health outcomes is an important public health priority. Many studies have investigated the association considering delayed effects within the past few days. Adaptive cumulative exposure distributed lag non-linear models (ACE-DLNMs) quantify associations between health outcomes and cumulative exposure that is specified in a data-adaptive way. While the ACE-DLNM framework is highly interpretable, it is limited to continuous outcomes and does not scale well to large datasets. Motivated by a large analysis of daily pollution and respiratory hospitalization counts in Canada between 2001 and 2018, we propose a generalized ACE-DLNM incorporating penalized splines, improving upon existing ACE-DLNM methods to accommodate general response types. We then develop a computationally efficient estimation strategy based on profile likelihood and Laplace approximate marginal likelihood with Newton-type methods. We demonstrate the performance and practical advantages of the proposed method through simulations. In application to the motivating analysis, the proposed method yields more stable inferences compared to generalized additive models with fixed exposures, while retaining interpretability.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
A general approach to modeling environmental mixtures with multivariate outcomes
Authors:
Glen McGee,
Joseph Antonelli
Abstract:
An important goal of environmental health research is to assess the health risks posed by mixtures of multiple environmental exposures. In these mixtures analyses, flexible models like Bayesian kernel machine regression and multiple index models are appealing because they allow for arbitrary non-linear exposure-outcome relationships. However, this flexibility comes at the cost of low power, partic…
▽ More
An important goal of environmental health research is to assess the health risks posed by mixtures of multiple environmental exposures. In these mixtures analyses, flexible models like Bayesian kernel machine regression and multiple index models are appealing because they allow for arbitrary non-linear exposure-outcome relationships. However, this flexibility comes at the cost of low power, particularly when exposures are highly correlated and the health effects are weak, as is typical in environmental health studies. We propose an adaptive index modelling strategy that borrows strength across exposures and outcomes by exploiting similar mixture component weights and exposure-response relationships. In the special case of distributed lag models, in which exposures are measured repeatedly over time, we jointly encourage co-clustering of lag profiles and exposure-response curves to more efficiently identify critical windows of vulnerability and characterize important exposure effects. We then extend the proposed approach to the multivariate index model setting where the true index structure -- the number of indices and their composition -- is unknown, and introduce variable importance measures to quantify component contributions to mixture effects. Using time series data from the National Morbidity, Mortality and Air Pollution Study, we demonstrate the proposed methods by jointly modelling three mortality outcomes and two cumulative air pollution measurements with a maximum lag of 14 days.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Collapsible Kernel Machine Regression for Exposomic Analyses
Authors:
Glen McGee,
Brent A. Coull,
Ander Wilson
Abstract:
An important goal of environmental epidemiology is to quantify the complex health risks posed by a wide array of environmental exposures. In analyses focusing on a smaller number of exposures within a mixture, flexible models like Bayesian kernel machine regression (BKMR) are appealing because they allow for non-linear and non-additive associations among mixture components. However, this flexibili…
▽ More
An important goal of environmental epidemiology is to quantify the complex health risks posed by a wide array of environmental exposures. In analyses focusing on a smaller number of exposures within a mixture, flexible models like Bayesian kernel machine regression (BKMR) are appealing because they allow for non-linear and non-additive associations among mixture components. However, this flexibility comes at the cost of low power and difficult interpretation, particularly in exposomic analyses when the number of exposures is large. We propose a flexible framework that allows for separate selection of additive and non-additive effects, unifying additive models and kernel machine regression. The proposed approach yields increased power and simpler interpretation when there is little evidence of interaction. Further, it allows users to specify separate priors for additive and non-additive effects, and allows for tests of non-additive interaction. We extend the approach to the class of multiple index models, in which the special case of kernel machine-distributed lag models are nested. We apply the method to motivating data from a subcohort of the Human Early Life Exposome (HELIX) study containing 65 mixture components grouped into 13 distinct exposure classes.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Flexible Marginal Models for Dependent Data
Authors:
Glen McGee,
Alex Stringer
Abstract:
Models for dependent data are distinguished by their targets of inference. Marginal models are useful when interest lies in quantifying associations averaged across a population of clusters. When the functional form of a covariate-outcome association is unknown, flexible regression methods are needed to allow for potentially non-linear relationships. We propose a novel marginal additive model (MAM…
▽ More
Models for dependent data are distinguished by their targets of inference. Marginal models are useful when interest lies in quantifying associations averaged across a population of clusters. When the functional form of a covariate-outcome association is unknown, flexible regression methods are needed to allow for potentially non-linear relationships. We propose a novel marginal additive model (MAM) for modelling cluster-correlated data with non-linear population-averaged associations. The proposed MAM is a unified framework for estimation and uncertainty quantification of a marginal mean model, combined with inference for between-cluster variability and cluster-specific prediction. We propose a fitting algorithm that enables efficient computation of standard errors and corrects for estimation of penalty terms. We demonstrate the proposed methods in simulations and in application to (i) a longitudinal study of beaver foraging behaviour, and (ii) a spatial analysis of Loaloa infection in West Africa. R code for implementing the proposed methodology is available at https://github.com/awstringer1/mam.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Integrating Biological Knowledge in Kernel-Based Analyses of Environmental Mixtures and Health
Authors:
Glen McGee,
Ander Wilson,
Brent A Coull,
Thomas F Webster
Abstract:
A key goal of environmental health research is to assess the risk posed by mixtures of pollutants. As epidemiologic studies of mixtures can be expensive to conduct, it behooves researchers to incorporate prior knowledge about mixtures into their analyses. This work extends the Bayesian multiple index model (BMIM), which assumes the exposure-response function is a non-parametric function of a set o…
▽ More
A key goal of environmental health research is to assess the risk posed by mixtures of pollutants. As epidemiologic studies of mixtures can be expensive to conduct, it behooves researchers to incorporate prior knowledge about mixtures into their analyses. This work extends the Bayesian multiple index model (BMIM), which assumes the exposure-response function is a non-parametric function of a set of linear combinations of pollutants formed with a set of exposure-specific weights. The framework is attractive because it combines the flexibility of response-surface methods with the interpretability of linear index models. We propose three strategies to incorporate prior toxicological knowledge into construction of indices in a BMIM: (a) constraining index weights, (b) structuring index weights by exposure transformations, and (c) placing informative priors on the index weights. We propose a novel prior specification that combines spike-and-slab variable selection with informative Dirichlet distribution based on relative potency factors often derived from previous toxicological studies. In simulations we show that the proposed priors improve inferences when prior information is correct and can protect against misspecification suffered by naive toxicological models when prior information is incorrect. Moreover, different strategies may be mixed-and-matched for different indices to suit available information (or lack thereof). We demonstrate the proposed methods on an analysis of data from the National Health and Nutrition Examination Survey and incorporate prior information on relative chemical potencies obtained from toxic equivalency factors available in the literature.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Bayesian Multiple Index Models for Environmental Mixtures
Authors:
Glen McGee,
Ander Wilson,
Thomas F. Webster,
Brent A. Coull
Abstract:
An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficie…
▽ More
An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analyzing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
On the Interplay Between Exposure Misclassification and Informative Cluster Size
Authors:
Glen McGee,
Marianthi-Anna Kioumourtzoglou,
Marc G. Weisskopf,
Sebastien Haneuse,
Brent A. Coull
Abstract:
In this paper we study the impact of exposure misclassification when cluster size is potentially informative (i.e., related to outcomes) and when misclassification is differential by cluster size. First, we show that misclassification in an exposure related to cluster size can induce informativeness when cluster size would otherwise be non-informative. Second, we show that misclassification that i…
▽ More
In this paper we study the impact of exposure misclassification when cluster size is potentially informative (i.e., related to outcomes) and when misclassification is differential by cluster size. First, we show that misclassification in an exposure related to cluster size can induce informativeness when cluster size would otherwise be non-informative. Second, we show that misclassification that is differential by informative cluster size can not only attenuate estimates of exposure effects but even inflate or reverse the sign of estimates. To correct for bias in estimating marginal parameters, we propose two frameworks: (i) an observed likelihood approach for joint marginalized models of cluster size and outcomes and (ii) an expected estimating equations approach. Although we focus on estimating marginal parameters, a corollary is that the observed likelihood approach permits valid inference for conditional parameters as well. Using data from the Nurses Health Study II, we compare the results of the proposed correction methods when applied to motivating data on the multigenerational effect of in-utero diethylstilbestrol exposure on attention-deficit/hyperactivity disorder in 106,198 children of 47,450 nurses.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.