-
The Metropolis algorithm: A useful tool for epidemiologists
Authors:
Alexander P Keil,
Jessie K Edwards,
Ashley I Naimi,
Stephen R Cole
Abstract:
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used to simulate from parameter distributions of interest, such as generalized linear model parameters. The "Metropolis step" is a keystone concept that underlies classical and modern MCMC methods and facilitates simple analysis of complex statistical models. Beyond Bayesian analysis, MCMC is useful for generating uncertainty…
▽ More
The Metropolis algorithm is a Markov chain Monte Carlo (MCMC) algorithm used to simulate from parameter distributions of interest, such as generalized linear model parameters. The "Metropolis step" is a keystone concept that underlies classical and modern MCMC methods and facilitates simple analysis of complex statistical models. Beyond Bayesian analysis, MCMC is useful for generating uncertainty intervals, even under the common scenario in causal inference in which the target parameter is not directly estimated by a single, fitted statistical model. We demonstrate, with a worked example, pseudo-code, and R code, the basic mechanics of the Metropolis algorithm. We use the Metropolis algorithm to estimate the odds ratio and risk difference contrasting the risk of childhood leukemia among those exposed to high versus low level magnetic fields. This approach can be used for inference from Bayesian and frequentist paradigms and, in small samples, offers advantages over large-sample methods like the bootstrap.
△ Less
Submitted 29 August, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Identifying and estimating effects of sustained interventions under parallel trends assumptions
Authors:
Audrey Renson,
Michael Hudgens,
Alexander Keil,
Paul Zivich,
Allison Aiello
Abstract:
Many research questions in public health and medicine concern sustained interventions in populations defined by substantive priorities. Existing methods to answer such questions typically require a measured covariate set sufficient to control confounding, which can be questionable in observational studies. Differences-in-differences relies instead on the parallel trends assumption, allowing for so…
▽ More
Many research questions in public health and medicine concern sustained interventions in populations defined by substantive priorities. Existing methods to answer such questions typically require a measured covariate set sufficient to control confounding, which can be questionable in observational studies. Differences-in-differences relies instead on the parallel trends assumption, allowing for some types of time-invariant unmeasured confounding. However, most existing difference-in-differences implementations are limited to point treatments in restricted subpopulations. We derive identification results for population effects of sustained treatments under parallel trends assumptions. In particular, in settings where all individuals begin follow-up with exposure status consistent with the treatment plan of interest but may deviate at later times, a version of Robins' g-formula identifies the intervention-specific mean under SUTVA, positivity, and parallel trends. We develop consistent asymptotically normal estimators based on inverse-probability weighting, outcome regression, and a double robust estimator based on targeted maximum likelihood. Simulation studies confirm theoretical results and support the use of the proposed estimators at realistic sample sizes. As an example, the methods are used to estimate the effect of a hypothetical federal stay-at-home order on all-cause mortality during the COVID-19 pandemic in spring 2020 in the United States.
△ Less
Submitted 29 September, 2022; v1 submitted 12 June, 2022;
originally announced June 2022.
-
A quantile-based g-computation approach to addressing the effects of exposure mixtures
Authors:
Alexander P. Keil,
Jessie P. Buckley,
Katie M. OBrien,
Kelly K. Ferguson,
Shanshan Zhao,
Alexandra J. White
Abstract:
Exposure mixtures frequently occur in data across many domains, particularly in the fields of environmental and nutritional epidemiology. Various strategies have arisen to answer questions about mixtures, including methods such as weighted quantile sum (WQS) regression that estimate a joint effect of the mixture components.We demonstrate a new approach to estimating the joint effects of a mixture:…
▽ More
Exposure mixtures frequently occur in data across many domains, particularly in the fields of environmental and nutritional epidemiology. Various strategies have arisen to answer questions about mixtures, including methods such as weighted quantile sum (WQS) regression that estimate a joint effect of the mixture components.We demonstrate a new approach to estimating the joint effects of a mixture: quantile g-computation. This approach combines the inferential simplicity of WQS regression with the flexibility of g-computation, a method of causal effect estimation. We use simulations to examine whether quantile g-computation and WQS regression can accurately and precisely estimate effects of mixtures in common scenarios. We examine the bias, confidence interval coverage, and bias-variance tradeoff of quantile g-computation and WQS regression, and how these quantities are impacted by the presence of non-causal exposures, exposure correlation, unmeasured confounding, and non-linear effects. Quantile g-computation, unlike WQS regression allows inference on mixture effects that is unbiased with appropriate confidence interval coverage at sample sizes typically encountered in epidemiologic studies and when the assumptions of WQS regression are not met. Further, WQS regression can magnify bias from unmeasured confounding that might occur if important components of the mixture are omitted. Unlike inferential approaches that examine effects of individual exposures, methods like quantile g-computation that can estimate the effect of a mixture are essential for understanding effects of potential public health actions that act on exposure sources. Our approach may serve to help bridge gaps between epidemiologic analysis and interventions such as regulations on industrial emissions or mining processes, dietary changes, or consumer behavioral changes that act on multiple exposures simultaneously.
△ Less
Submitted 11 March, 2020; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Super learning in the SAS system
Authors:
Alexander P. Keil,
Daniel Westreich,
Jessie K Edwards,
Stephen R Cole
Abstract:
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. An implementation of stacking, called super learning, has been developed as a general approach to supervised learning and has seen frequent usage, in part due to the availability of an R package. We develop super…
▽ More
Background and objective: Stacking is an ensemble machine learning method that averages predictions from multiple other algorithms, such as generalized linear models and regression trees. An implementation of stacking, called super learning, has been developed as a general approach to supervised learning and has seen frequent usage, in part due to the availability of an R package. We develop super learning in the SAS software system using a new macro, and demonstrate its performance relative to the R package.
Methods: Following previous work using the R SuperLearner package we assess the performance of super learning in a number of domains. We compare the R package with the new SAS macro in a small set of simulations assessing curve fitting in a predictive model as well in a set of 14 publicly available datasets to assess cross-validated accuracy.
Results: Across the simulated data and the publicly available data, the SAS macro performed similarly to the R package, despite a different set of potential algorithms available natively in R and SAS.
Conclusions: Our super learner macro performs as well as the R package at a number of tasks. Further, by extending the macro to include the use of R packages, the macro can leverage both the robust, enterprise oriented procedures in SAS and the nimble, cutting edge packages in R. In the spirit of ensemble learning, this macro extends the potential library of algorithms beyond a single software system and provides a simple avenue into machine learning in SAS.
△ Less
Submitted 31 July, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
A Bayesian approach to the g-formula
Authors:
Alexander P. Keil,
Eric J. Daza,
Stephanie M. Engel,
Jessie P. Buckley,
Jessie K. Edwards
Abstract:
Epidemiologists often wish to estimate quantities that are easy to communicate and correspond to the results of realistic public health scenarios. Methods from causal inference can answer these questions. We adopt the language of potential outcomes under Rubin's original Bayesian framework and show that the parametric g-formula is easily amenable to a Bayesian approach. We show that the frequentis…
▽ More
Epidemiologists often wish to estimate quantities that are easy to communicate and correspond to the results of realistic public health scenarios. Methods from causal inference can answer these questions. We adopt the language of potential outcomes under Rubin's original Bayesian framework and show that the parametric g-formula is easily amenable to a Bayesian approach. We show that the frequentist properties of the Bayesian g-formula suggest it improves the accuracy of estimates of causal effects in small samples or when data may be sparse. We demonstrate our approach to estimate the effect of environmental tobacco smoke on body mass index z-scores among children aged 4-9 years who were enrolled in a longitudinal birth cohort in New York, USA. We give a general algorithm and supply SAS and Stan code that can be adopted to implement our computational approach in both time-fixed and longitudinal data.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.