-
Identification and estimation of vaccine effectiveness in the test-negative design under equi-confounding
Authors:
Christopher B. Boyer,
Kendrick Qijun Li,
Xu Shi,
Eric J. Tchetgen Tchetgen
Abstract:
The test-negative design (TND) is frequently used to evaluate vaccine effectiveness in real-world settings. In a TND study, individuals with similar symptoms who seek care are tested for the disease of interest, and vaccine effectiveness is estimated by comparing the vaccination history of test-positive cases and test-negative controls. Traditional approaches justify the TND by assuming either (a)…
▽ More
The test-negative design (TND) is frequently used to evaluate vaccine effectiveness in real-world settings. In a TND study, individuals with similar symptoms who seek care are tested for the disease of interest, and vaccine effectiveness is estimated by comparing the vaccination history of test-positive cases and test-negative controls. Traditional approaches justify the TND by assuming either (a) receiving a test is a perfect proxy for unmeasured health-seeking behavior or (b) vaccination is unconfounded given measured covariates -- both of which may be unrealistic in practice. In this paper, we return to the original motivation for the TND and propose an alternative justification based on the assumption of odds ratio equi-confounding, where unmeasured confounders influence test-positive and test-negative individuals equivalently on the odds ratio scale. We discuss the implications of this assumption for TND design and provide alternative estimators for the marginal risk ratio among the vaccinated under equi-confounding, including estimators based on outcome modeling and inverse probability weighting as well as a semiparametric estimator that is doubly-robust. When the equi-confounding assumption does not hold, we suggest a sensitivity analysis that parameterizes the magnitude of the deviation on the odds ratio scale. We conduct a simulation study to evaluate the empirical performance of our proposed estimators under a wide range of scenarios.
△ Less
Submitted 11 May, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
Beyond Fixed Restriction Time: Adaptive Restricted Mean Survival Time Methods in Clinical Trials
Authors:
Jinghao Sun,
Douglas E. Schaubel,
Eric J. Tchetgen Tchetgen
Abstract:
Restricted mean survival time (RMST) offers a compelling nonparametric alternative to hazard ratios for right-censored time-to-event data, particularly when the proportional hazards assumption is violated. By capturing the total event-free time over a specified horizon, RMST provides an intuitive and clinically meaningful measure of absolute treatment benefit. Nonetheless, selecting the restrictio…
▽ More
Restricted mean survival time (RMST) offers a compelling nonparametric alternative to hazard ratios for right-censored time-to-event data, particularly when the proportional hazards assumption is violated. By capturing the total event-free time over a specified horizon, RMST provides an intuitive and clinically meaningful measure of absolute treatment benefit. Nonetheless, selecting the restriction time $L$ poses challenges: choosing a small $L$ can overlook late-emerging benefits, whereas a large $L$, often underestimated in its impact, may inflate variance and undermine power. We propose a novel data-driven, adaptive procedure that identifies the optimal restriction time $L^*$ from a continuous range by maximizing a criterion balancing effect size and estimation precision. Consequently, our procedure is particularly useful when the pattern of the treatment effect is unknown at the design stage. We provide a rigorous theoretical foundation that accounts for variability introduced by adaptively choosing $L^*$. To address nonregular estimation under the null, we develop two complementary strategies: a convex-hull-based estimator, and a penalized approach that further enhances power. Additionally, when restriction time candidates are defined on a discrete grid, we propose a procedure that surprisingly incurs no asymptotic penalty for selection, thus achieving oracle performance. Extensive simulations across realistic survival scenarios demonstrate that our method outperforms traditional RMST analyses and the log-rank test, achieving superior power while maintaining nominal Type I error rates. In a phase III pancreatic cancer trial with transient treatment effects, our procedure uncovers clinically meaningful benefits that standard methods overlook. Our methods are implemented in the R package AdaRMST.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Doubly Robust and Efficient Calibration of Prediction Sets for Censored Time-to-Event Outcomes
Authors:
Rebecca Farina,
Eric J. Tchetgen Tchetgen,
Arun Kumar Kuchibhotla
Abstract:
Our objective is to construct well-calibrated prediction sets for a time-to-event outcome subject to right-censoring with guaranteed coverage. Our approach is inspired by modern conformal inference literature in that, unlike classical frameworks, we obviate the need for a well-specified parametric or semiparametric survival model to accomplish our goal. In contrast to existing conformal prediction…
▽ More
Our objective is to construct well-calibrated prediction sets for a time-to-event outcome subject to right-censoring with guaranteed coverage. Our approach is inspired by modern conformal inference literature in that, unlike classical frameworks, we obviate the need for a well-specified parametric or semiparametric survival model to accomplish our goal. In contrast to existing conformal prediction methods for survival data, which restrict censoring to be of Type I, whereby potential censoring times are assumed to be fully observed on all units in both training and validation samples, we consider the more common right-censoring setting in which either only the censoring time or only the event time of primary interest is directly observed, whichever comes first. Under a standard conditional independence assumption between the potential survival and censoring times given covariates, we propose and analyze two methods to construct valid and efficient lower predictive bounds for the survival time of a future observation. The proposed methods build upon modern semiparametric efficiency theory for censored data, in that the first approach incorporates inverse-probability-of-censoring weighting to account for censoring, while the second approach is based on augmenting this method with an additional correction term. For both methods, we formally establish asymptotic coverage guarantees and demonstrate, both theoretically and through empirical experiments, that the augmented approach substantially improves efficiency over the inverse-probability-of-censoring weighting method. Specifically, its coverage error bound is of second-order mixed bias type, that is doubly robust, and therefore guaranteed to be asymptotically negligible relative to the coverage error of the non-augmented method.
△ Less
Submitted 12 March, 2025; v1 submitted 8 January, 2025;
originally announced January 2025.
-
The Nudge Average Treatment Effect
Authors:
Eric J Tchetgen Tchetgen
Abstract:
The instrumental variable method is a prominent approach to recover under certain conditions, valid inference about a treatment causal effect even when unmeasured confounding might be present. In a groundbreaking paper, Imbens and Angrist (1994) established that a valid instrument nonparametrically identifies the average causal effect among compliers, also known as the local average treatment effe…
▽ More
The instrumental variable method is a prominent approach to recover under certain conditions, valid inference about a treatment causal effect even when unmeasured confounding might be present. In a groundbreaking paper, Imbens and Angrist (1994) established that a valid instrument nonparametrically identifies the average causal effect among compliers, also known as the local average treatment effect under a certain monotonicity assumption which rules out the existence of so-called defiers. An often-cited attractive property of monotonicity is that it facilitates a causal interpretation of the instrumental variable estimand without restricting the degree of heterogeneity of the treatment causal effect. In this paper, we introduce an alternative equally straightforward and interpretable condition for identification, which accommodates both the presence of defiers and heterogenous treatment effects. Mainly, we show that under our new conditions, the instrumental variable estimand recovers the average causal effect for the subgroup of units for whom the treatment is manipulable by the instrument, a subgroup which may consist of both defiers and compliers, therefore recovering an effect estimand we aptly call the Nudge Average Treatment Effect.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Regression-based proximal causal inference for right-censored time-to-event data
Authors:
Kendrick Li,
George C. Linderman,
Xu Shi,
Eric J. Tchetgen Tchetgen
Abstract:
Unmeasured confounding is one of the major concerns in causal inference from observational data. Proximal causal inference (PCI) is an emerging methodological framework to detect and potentially account for confounding bias by carefully leveraging a pair of negative control exposure (NCE) and outcome (NCO) variables, also known as treatment and outcome confounding proxies. Although regression-base…
▽ More
Unmeasured confounding is one of the major concerns in causal inference from observational data. Proximal causal inference (PCI) is an emerging methodological framework to detect and potentially account for confounding bias by carefully leveraging a pair of negative control exposure (NCE) and outcome (NCO) variables, also known as treatment and outcome confounding proxies. Although regression-based PCI is well developed for binary and continuous outcomes, analogous PCI regression methods for right-censored time-to-event outcomes are currently lacking. In this paper, we propose a novel two-stage regression PCI approach for right-censored survival data under an additive hazard structural model. We provide theoretical justification for the proposed approach tailored to different types of NCOs, including continuous, count, and right-censored time-to-event variables. We illustrate the approach with an evaluation of the effectiveness of right heart catheterization among critically ill patients using data from the SUPPORT study. Our method is implemented in the open-access R package 'pci2s'.
△ Less
Submitted 23 April, 2025; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Regression-Based Proximal Causal Inference
Authors:
Jiewen Liu,
Chan Park,
Kendrick Li,
Eric J. Tchetgen Tchetgen
Abstract:
Negative controls are increasingly used to evaluate the presence of potential unmeasured confounding in observational studies. Beyond the use of negative controls to detect the presence of residual confounding, proximal causal inference (PCI) was recently proposed to de-bias confounded causal effect estimates, by leveraging a pair of treatment and outcome negative control or confounding proxy vari…
▽ More
Negative controls are increasingly used to evaluate the presence of potential unmeasured confounding in observational studies. Beyond the use of negative controls to detect the presence of residual confounding, proximal causal inference (PCI) was recently proposed to de-bias confounded causal effect estimates, by leveraging a pair of treatment and outcome negative control or confounding proxy variables. While formal methods for statistical inference have been developed for PCI, these methods can be challenging to implement as they involve solving complex integral equations that are typically ill-posed. We develop a regression-based PCI approach, employing two-stage generalized linear regression models (GLMs) to implement PCI, which obviates the need to solve difficult integral equations. The proposed approach has merit in that (i) it is applicable to continuous, count, and binary outcomes cases, making it relevant to a wide range of real-world applications, and (ii) it is easy to implement using off-the-shelf software for GLMs. We establish the statistical properties of regression-based PCI and illustrate their performance in both synthetic and real-world empirical applications.
△ Less
Submitted 5 June, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Proximal Causal Inference for Synthetic Control with Surrogates
Authors:
Jizhou Liu,
Eric J. Tchetgen Tchetgen,
Carlos Varjão
Abstract:
The synthetic control method (SCM) has become a popular tool for estimating causal effects in policy evaluation, where a single treated unit is observed, and a heterogeneous set of untreated units with pre- and post-policy change data are also observed. However, the synthetic control method faces challenges in accurately predicting post-intervention potential outcome had, contrary to fact, the tre…
▽ More
The synthetic control method (SCM) has become a popular tool for estimating causal effects in policy evaluation, where a single treated unit is observed, and a heterogeneous set of untreated units with pre- and post-policy change data are also observed. However, the synthetic control method faces challenges in accurately predicting post-intervention potential outcome had, contrary to fact, the treatment been withheld, when the pre-intervention period is short or the post-intervention period is long. To address these issues, we propose a novel method that leverages post-intervention information, specifically time-varying correlates of the causal effect called "surrogates", within the synthetic control framework. We establish conditions for identifying model parameters using the proximal inference framework and apply the generalized method of moments (GMM) approach for estimation and inference about the average treatment effect on the treated (ATT). Interestingly, we uncover specific conditions under which exclusively using post-intervention data suffices for estimation within our framework. Moreover, we explore several extensions, including covariates adjustment, relaxing linearity assumptions through non-parametric identification, and incorporating so-called "contaminated" surrogates, which do not exactly satisfy conditions to be valid surrogates but nevertheless can be incorporated via a simple modification of the proposed approach. Through a simulation study, we demonstrate that our method can outperform other synthetic control methods in estimating both short-term and long-term effects, yielding more accurate inferences. In an empirical application examining the Panic of 1907, one of the worst financial crises in U.S. history, we confirm the practical relevance of our theoretical results.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Using Instruments for Selection to Adjust for Selection Bias in Mendelian Randomization
Authors:
Apostolos Gkatzionis,
Eric J. Tchetgen Tchetgen,
Jon Heron,
Kate Northstone,
Kate Tilling
Abstract:
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missi…
▽ More
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missing not at random, Heckman's sample selection model can be used to adjust for bias due to missing data. In this paper, we review Heckman's method and a similar approach proposed by Tchetgen Tchetgen and Wirth (2017). We then discuss how to apply these methods to Mendelian randomization analyses using individual-level data, with missing data for either the exposure or outcome or both. We explore whether genetic variants associated with participation can be used as instruments for selection. We then describe how to obtain missingness-adjusted Wald ratio, two-stage least squares and inverse variance weighted estimates. The two methods are evaluated and compared in simulations, with results suggesting that they can both mitigate selection bias but may yield parameter estimates with large standard errors in some settings. In an illustrative real-data application, we investigate the effects of body mass index on smoking using data from the Avon Longitudinal Study of Parents and Children.
△ Less
Submitted 13 April, 2024; v1 submitted 4 August, 2022;
originally announced August 2022.
-
A self-censoring model for multivariate nonignorable nonmonotone missing data
Authors:
Yilin Li,
Wang Miao,
Ilya Shpitser,
Eric J. Tchetgen Tchetgen
Abstract:
We introduce a self-censoring model for multivariate nonignorable nonmonotone missing data, where the missingness process of each outcome is affected by its own value and is associated with missingness indicators of other outcomes, while conditionally independent of the other outcomes. The self-censoring model complements previous graphical approaches for the analysis of multivariate nonignorable…
▽ More
We introduce a self-censoring model for multivariate nonignorable nonmonotone missing data, where the missingness process of each outcome is affected by its own value and is associated with missingness indicators of other outcomes, while conditionally independent of the other outcomes. The self-censoring model complements previous graphical approaches for the analysis of multivariate nonignorable missing data. It is identified under a completeness condition stating that any variability in one outcome can be captured by variability in the other outcomes among complete cases. For estimation, we propose a suite of semiparametric estimators including doubly robust estimators that deliver valid inferences under partial misspecification of the full-data distribution. We evaluate the performance of the proposed estimators with simulations and apply them to analyze a study about the effect of highly active antiretroviral therapy on preterm delivery of HIV-positive mothers.
△ Less
Submitted 30 September, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Proximal Causal Inference for Marginal Counterfactual Survival Curves
Authors:
Andrew Ying,
Yifan Cui,
Eric J. Tchetgen Tchetgen
Abstract:
Contrasting marginal counterfactual survival curves across treatment arms is an effective and popular approach for inferring the causal effect of an intervention on a right-censored time-to-event outcome. A key challenge to drawing such inferences in observational settings is the possible existence of unmeasured confounding, which may invalidate most commonly used methods that assume no hidden con…
▽ More
Contrasting marginal counterfactual survival curves across treatment arms is an effective and popular approach for inferring the causal effect of an intervention on a right-censored time-to-event outcome. A key challenge to drawing such inferences in observational settings is the possible existence of unmeasured confounding, which may invalidate most commonly used methods that assume no hidden confounding bias. In this paper, rather than making the standard no unmeasured confounding assumption, we extend the recently proposed proximal causal inference framework of Miao et al. (2018), Tchetgen et al. (2020), Cui et al. (2020) to obtain nonparametric identification of a causal survival contrast by leveraging observed covariates as imperfect proxies of unmeasured confounders. Specifically, we develop a proximal inverse probability-weighted (PIPW) estimator, the proximal analog of standard IPW, which allows the observed data distribution for the time-to-event outcome to remain completely unrestricted. PIPW estimation relies on a parametric model for a so-called treatment confounding bridge function relating the treatment process to confounding proxies. As a result, PIPW might be sensitive to model misspecification. To improve robustness and efficiency, we also propose a proximal doubly robust estimator and establish uniform consistency and asymptotic normality of both estimators. We conduct extensive simulations to examine the finite sample performance of our estimators, and proposed methods are applied to a study evaluating the effectiveness of right heart catheterization in the intensive care unit of critically ill patients.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Using negative controls to identify causal effects with invalid instrumental variables
Authors:
Oliver Dukes,
David B. Richardson,
Zachary Shahn,
James M. Robins,
Eric J. Tchetgen Tchetgen
Abstract:
Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage sub-populations for…
▽ More
Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage sub-populations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop the semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.
△ Less
Submitted 23 January, 2025; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Proximal mediation analysis
Authors:
Oliver Dukes,
Ilya Shpitser,
Eric J. Tchetgen Tchetgen
Abstract:
A common concern when trying to draw causal inferences from observational data is that the measured covariates are insufficiently rich to account for all sources of confounding. In practice, many of the covariates may only be proxies of the latent confounding mechanism. Recent work has shown that in certain settings where the standard 'no unmeasured confounding' assumption fails, proxy variables c…
▽ More
A common concern when trying to draw causal inferences from observational data is that the measured covariates are insufficiently rich to account for all sources of confounding. In practice, many of the covariates may only be proxies of the latent confounding mechanism. Recent work has shown that in certain settings where the standard 'no unmeasured confounding' assumption fails, proxy variables can be leveraged to identify causal effects. Results currently exist for the total causal effect of an intervention, but little consideration has been given to learning about the direct or indirect pathways of the effect through a mediator variable. In this work, we describe three separate proximal identification results for natural direct and indirect effects in the presence of unmeasured confounding. We then develop a semiparametric framework for inference on natural (in)direct effects, which leads us to locally efficient, multiply robust estimators.
△ Less
Submitted 30 August, 2023; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Proximal Causal Inference for Complex Longitudinal Studies
Authors:
Andrew Ying,
Wang Miao,
Xu Shi,
Eric J. Tchetgen Tchetgen
Abstract:
A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as "sequential randomization assumption (SRA)". SRA is often criticized as it requires one to accurately measure all confounders. Realistically, meas…
▽ More
A standard assumption for causal inference about the joint effects of time-varying treatment is that one has measured sufficient covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values, also known as "sequential randomization assumption (SRA)". SRA is often criticized as it requires one to accurately measure all confounders. Realistically, measured covariates can rarely capture all confounders with certainty. Often covariate measurements are at best proxies of confounders, thus invalidating inferences under SRA. In this paper, we extend the proximal causal inference (PCI) framework of Miao et al. (2018) to the longitudinal setting under a semiparametric marginal structural mean model (MSMM). PCI offers an opportunity to learn about joint causal effects in settings where SRA based on measured time-varying covariates fails, by formally accounting for the covariate measurements as imperfect proxies of underlying confounding mechanisms. We establish nonparametric identification with a pair of time-varying proxies and provide a corresponding characterization of regular and asymptotically linear estimators of the parameter indexing the MSMM, including a rich class of doubly robust estimators, and establish the corresponding semiparametric efficiency bound for the MSMM. Extensive simulation studies and a data application illustrate the finite sample behavior of proposed methods.
△ Less
Submitted 3 August, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Identification and Estimation of Causal Peer Effects Using Double Negative Controls for Unmeasured Network Confounding
Authors:
Naoki Egami,
Eric J. Tchetgen Tchetgen
Abstract:
Scientists have been interested in estimating causal peer effects to understand how people's behaviors are affected by their network peers. However, it is well known that identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and conte…
▽ More
Scientists have been interested in estimating causal peer effects to understand how people's behaviors are affected by their network peers. However, it is well known that identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and contextual confounding. The second issue is network dependence of observations, which one must take into account for valid statistical inference. Negative control variables, also known as placebo variables, have been widely used in observational studies including peer effect analysis over networks, although they have been used primarily for bias detection. In this article, we establish a formal framework which leverages a pair of negative control outcome and exposure variables (double negative controls) to nonparametrically identify causal peer effects in the presence of unmeasured network confounding. We then propose a generalized method of moments estimator for causal peer effects, and establish its consistency and asymptotic normality under an assumption about $ψ$-network dependence. Finally, we provide a network heteroskedasticity and autocorrelation consistent variance estimator. Our methods are illustrated with an application to peer effects in education.
△ Less
Submitted 4 September, 2021;
originally announced September 2021.
-
The Proximal ID Algorithm
Authors:
Ilya Shpitser,
Zach Wood-Doughty,
Eric J. Tchetgen Tchetgen
Abstract:
Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in g…
▽ More
Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application.
△ Less
Submitted 25 June, 2023; v1 submitted 15 August, 2021;
originally announced August 2021.
-
A New Causal Approach to Account for Treatment Switching in Randomized Experiments under a Structural Cumulative Survival Model
Authors:
Andrew Ying,
Eric J. Tchetgen Tchetgen
Abstract:
Treatment switching in a randomized controlled trial is said to occur when a patient randomized to one treatment arm switches to another treatment arm during follow-up. This can occur at the point of disease progression, whereby patients in the control arm may be offered the experimental treatment. It is widely known that failure to account for treatment switching can seriously dilute the estimate…
▽ More
Treatment switching in a randomized controlled trial is said to occur when a patient randomized to one treatment arm switches to another treatment arm during follow-up. This can occur at the point of disease progression, whereby patients in the control arm may be offered the experimental treatment. It is widely known that failure to account for treatment switching can seriously dilute the estimated effect of treatment on overall survival. In this paper, we aim to account for the potential impact of treatment switching in a re-analysis evaluating the treatment effect of NucleosideReverse Transcriptase Inhibitors (NRTIs) on a safety outcome (time to first severe or worse sign or symptom) in participants receiving a new antiretroviral regimen that either included or omitted NRTIs in the Optimized Treatment That Includes or OmitsNRTIs (OPTIONS) trial. We propose an estimator of a treatment causal effect under a structural cumulative survival model (SCSM) that leverages randomization as an instrumental variable to account for selective treatment switching. Unlike Robins' accelerated failure time model often used to address treatment switching, the proposed approach avoids the need for artificial censoring for estimation. We establish that the proposed estimator is uniformly consistent and asymptotically Gaussian under standard regularity conditions. A consistent variance estimator is also given and a simple resampling approach provides uniform confidence bands for the causal difference comparing treatment groups overtime on the cumulative intensity scale. We develop an R package named "ivsacim" implementing all proposed methods, freely available to download from R CRAN. We examine the finite performance of the estimator via extensive simulations.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
An Introduction to Proximal Causal Learning
Authors:
Eric J Tchetgen Tchetgen,
Andrew Ying,
Yifan Cui,
Xu Shi,
Wang Miao
Abstract:
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators' ability to accurately measure covariates c…
▽ More
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators' ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions. Causal learning from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal learning, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. Sufficient conditions for nonparametric identification are given, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation. These may be viewed as generalizations of Robins' foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Both point treatment and time-varying treatment settings are considered, and an application of proximal g-computation of causal effects is given for illustration.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
A coherent likelihood parametrization for doubly robust estimation of a causal effect with missing confounders
Authors:
Katherine Evans,
Isabel Fulcher,
Eric J. Tchetgen Tchetgen
Abstract:
Missing data and confounding are two problems researchers face in observational studies for comparative effectiveness. Williamson et al. (2012) recently proposed a unified approach to handle both issues concurrently using a multiply-robust (MR) methodology under the assumption that confounders are missing at random. Their approach considers a union of models in which any submodel has a parametric…
▽ More
Missing data and confounding are two problems researchers face in observational studies for comparative effectiveness. Williamson et al. (2012) recently proposed a unified approach to handle both issues concurrently using a multiply-robust (MR) methodology under the assumption that confounders are missing at random. Their approach considers a union of models in which any submodel has a parametric component while the remaining models are unrestricted. We show that while their estimating function is MR in theory, the possibility for multiply robust inference is complicated by the fact that parametric models for different components of the union model are not variation independent and therefore the MR property is unlikely to hold in practice. To address this, we propose an alternative transparent parametrization of the likelihood function, which makes explicit the model dependencies between various nuisance functions needed to evaluate the MR efficient score. The proposed method is genuinely doubly-robust (DR) in that it is consistent and asymptotic normal if one of two sets of modeling assumptions holds. We evaluate the performance and doubly robust property of the DR method via a simulation study.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Conditional separable effects
Authors:
Mats J. Stensrud,
James M. Robins,
Aaron Sarvet,
Eric J. Tchetgen Tchetgen,
Jessica G. Young
Abstract:
Researchers are often interested in treatment effects on outcomes that are only defined conditional on a post-treatment event status. For example, in a study of the effect of different cancer treatments on quality of life at end of follow-up, the quality of life of individuals who die during the study is undefined. In these settings, a naive contrast of outcomes conditional on the post-treatment v…
▽ More
Researchers are often interested in treatment effects on outcomes that are only defined conditional on a post-treatment event status. For example, in a study of the effect of different cancer treatments on quality of life at end of follow-up, the quality of life of individuals who die during the study is undefined. In these settings, a naive contrast of outcomes conditional on the post-treatment variable is not an average causal effect, even in a randomized experiment. Therefore the effect in the principal stratum of those who would have the same value of the post-treatment variable regardless of treatment, such as the always survivors in a truncation by death setting, is often advocated for causal inference. While this principal stratum effect is a well defined causal contrast, it is often hard to justify that it is relevant to scientists, patients or policy makers, and it cannot be identified without relying on unfalsifiable assumptions. Here we formulate alternative estimands, the conditional separable effects, that have a natural causal interpretation under assumptions that can be falsified in a randomized experiment. We provide identification results and introduce different estimators, including a doubly robust estimator derived from the nonparametric influence function. As an illustration, we estimate a conditional separable effect of chemotherapies on quality of life in patients with prostate cancer, using data from a randomized clinical trial.
△ Less
Submitted 7 June, 2021; v1 submitted 28 June, 2020;
originally announced June 2020.
-
Generalized interpretation and identification of separable effects in competing event settings
Authors:
Mats J. Stensrud,
Miguel A. Hernán,
Eric J. Tchetgen Tchetgen,
James M. Robins,
Vanessa Didelez,
Jessica G. Young
Abstract:
In competing event settings, a counterfactual contrast of cause-specific cumulative incidences quantifies the total causal effect of a treatment on the event of interest. However, effects of treatment on the competing event may indirectly contribute to this total effect, complicating its interpretation. We previously proposed the separable effects (Stensrud et al, 2019) to define direct and indire…
▽ More
In competing event settings, a counterfactual contrast of cause-specific cumulative incidences quantifies the total causal effect of a treatment on the event of interest. However, effects of treatment on the competing event may indirectly contribute to this total effect, complicating its interpretation. We previously proposed the separable effects (Stensrud et al, 2019) to define direct and indirect effects of the treatment on the event of interest. This definition presupposes a treatment decomposition into two components acting along two separate causal pathways, one exclusively outside of the competing event and the other exclusively through it. Unlike previous definitions of direct and indirect effects, the separable effects can be subject to empirical scrutiny in a study where separate interventions on the treatment components are available. Here we extend and generalize the notion of the separable effects in several ways, allowing for interpretation, identification and estimation under considerably weaker assumptions. We propose and discuss a definition of separable effects that is applicable to general time-varying structures, where the separable effects can still be meaningfully interpreted, even when they cannot be regarded as direct and indirect effects. We further derive weaker conditions for identification of separable effects in observational studies where decomposed treatments are not yet available; in particular, these conditions allow for time-varying common causes of the event of interest, the competing events and loss to follow-up. For these general settings, we propose semi-parametric weighted estimators that are straightforward to implement. As an illustration, we apply the estimators to study the separable effects of intensive blood pressure therapy on acute kidney injury, using data from a randomized clinical trial.
△ Less
Submitted 4 May, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Counterexamples to "The Blessings of Multiple Causes" by Wang and Blei
Authors:
Elizabeth L. Ogburn,
Ilya Shpitser,
Eric J. Tchetgen Tchetgen
Abstract:
This note has been updated (April, 2020) to respond to "Towards Clarifying the Theory of the Deconfounder" by Yixin Wang, David M. Blei (arXiv:2003.04948). This original note, posted in January, 2020, is meant to complement our previous comment on "The Blessings of Multiple Causes" by Wang and Blei (2019). We provide a more succinct and transparent explanation of the fact that the deconfounder doe…
▽ More
This note has been updated (April, 2020) to respond to "Towards Clarifying the Theory of the Deconfounder" by Yixin Wang, David M. Blei (arXiv:2003.04948). This original note, posted in January, 2020, is meant to complement our previous comment on "The Blessings of Multiple Causes" by Wang and Blei (2019). We provide a more succinct and transparent explanation of the fact that the deconfounder does not control for multi-cause confounding. The argument given in Wang and Blei (2019) makes two mistakes: (1) attempting to infer independence conditional on one variable from independence conditional on a different, unrelated variable, and (2) attempting to infer joint independence from pairwise independence. We give two simple counterexamples to the deconfounder claim.
△ Less
Submitted 24 April, 2020; v1 submitted 17 January, 2020;
originally announced January 2020.
-
A Semiparametric Approach to Model-based Sensitivity Analysis in Observational Studies
Authors:
Bo Zhang,
Eric J. Tchetgen Tchetgen
Abstract:
When drawing causal inference from observational data, there is always concern about unmeasured confounding. One way to tackle this is to conduct a sensitivity analysis. One widely-used sensitivity analysis framework hypothesizes the existence of a scalar unmeasured confounder U and asks how the causal conclusion would change were U measured and included in the primary analysis. Works along this l…
▽ More
When drawing causal inference from observational data, there is always concern about unmeasured confounding. One way to tackle this is to conduct a sensitivity analysis. One widely-used sensitivity analysis framework hypothesizes the existence of a scalar unmeasured confounder U and asks how the causal conclusion would change were U measured and included in the primary analysis. Works along this line often make various parametric assumptions on U, for the sake of mathematical and computational simplicity. In this article, we further this line of research by developing a valid sensitivity analysis that leaves the distribution of U unrestricted. Our semiparametric estimator has three desirable features compared to many existing methods in the literature. First, our method allows for a larger and more flexible family of models, and mitigates observable implications (Franks et al., 2019). Second, our methods work seamlessly with any primary analysis that models the outcome regression parametrically. Third, our method is easy to use and interpret. We construct both pointwise confidence intervals and confidence bands that are uniformly valid over a given sensitivity parameter space, thus formally accounting for unknown sensitivity parameters. We apply our proposed method on an influential yet controversial study of the causal relationship between war experiences and political activeness using observational data from Uganda.
△ Less
Submitted 19 June, 2022; v1 submitted 30 October, 2019;
originally announced October 2019.
-
Comment on "Blessings of Multiple Causes"
Authors:
Elizabeth L. Ogburn,
Ilya Shpitser,
Eric J. Tchetgen Tchetgen
Abstract:
(This comment has been updated to respond to Wang and Blei's rejoinder [arXiv:1910.07320].)
The premise of the deconfounder method proposed in "Blessings of Multiple Causes" by Wang and Blei [arXiv:1805.06826], namely that a variable that renders multiple causes conditionally independent also controls for unmeasured multi-cause confounding, is incorrect. This can be seen by noting that no fact a…
▽ More
(This comment has been updated to respond to Wang and Blei's rejoinder [arXiv:1910.07320].)
The premise of the deconfounder method proposed in "Blessings of Multiple Causes" by Wang and Blei [arXiv:1805.06826], namely that a variable that renders multiple causes conditionally independent also controls for unmeasured multi-cause confounding, is incorrect. This can be seen by noting that no fact about the observed data alone can be informative about ignorability, since ignorability is compatible with any observed data distribution. Methods to control for unmeasured confounding may be valid with additional assumptions in specific settings, but they cannot, in general, provide a checkable approach to causal inference, and they do not, in general, require weaker assumptions than the assumptions that are commonly used for causal inference. While this is outside the scope of this comment, we note that much recent work on applying ideas from latent variable modeling to causal inference problems suffers from similar issues.
△ Less
Submitted 17 October, 2019; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model
Authors:
Daniel Malinsky,
Ilya Shpitser,
Eric J Tchetgen Tchetgen
Abstract:
We study the identification and estimation of statistical functionals of multivariate data missing non-monotonically and not-at-random, taking a semiparametric approach. Specifically, we assume that the missingness mechanism satisfies what has been previously called "no self-censoring" or "itemwise conditionally independent nonresponse," which roughly corresponds to the assumption that no partiall…
▽ More
We study the identification and estimation of statistical functionals of multivariate data missing non-monotonically and not-at-random, taking a semiparametric approach. Specifically, we assume that the missingness mechanism satisfies what has been previously called "no self-censoring" or "itemwise conditionally independent nonresponse," which roughly corresponds to the assumption that no partially-observed variable directly determines its own missingness status. We show that this assumption, combined with an odds ratio parameterization of the joint density, enables identification of functionals of interest, and we establish the semiparametric efficiency bound for the nonparametric model satisfying this assumption. We propose a practical augmented inverse probability weighted estimator, and in the setting with a (possibly high-dimensional) always-observed subset of covariates, our proposed estimator enjoys a certain double-robustness property. We explore the performance of our estimator with simulation experiments and on a previously-studied data set of HIV-positive mothers in Botswana.
△ Less
Submitted 22 December, 2022; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Quantifying and Detecting Individual Level `Always Survivor' Causal Effects Under `Truncation by Death' and Censoring Through Time
Authors:
Jaffer M. Zaidi,
Eric J. Tchetgen Tchetgen,
Tyler J. VanderWeele
Abstract:
The analysis of causal effects when the outcome of interest is possibly truncated by death has a long history in statistics and causal inference. The survivor average causal effect is commonly identified with more assumptions than those guaranteed by the design of a randomized clinical trial or using sensitivity analysis. This paper demonstrates that individual level causal effects in the `always…
▽ More
The analysis of causal effects when the outcome of interest is possibly truncated by death has a long history in statistics and causal inference. The survivor average causal effect is commonly identified with more assumptions than those guaranteed by the design of a randomized clinical trial or using sensitivity analysis. This paper demonstrates that individual level causal effects in the `always survivor' principal stratum can be identified with no stronger identification assumptions than randomization. We illustrate the practical utility of our methods using data from a clinical trial on patients with prostate cancer. Our methodology is the first and, as of yet, only proposed procedure that enables detecting causal effects in the presence of truncation by death using only the assumptions that are guaranteed by design of the clinical trial. This methodology is applicable to all types of outcomes.
△ Less
Submitted 20 March, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Marginal Structural Models for Time-varying Endogenous Treatments: A Time-Varying Instrumental Variable Approach
Authors:
Eric J Tchetgen Tchetgen,
Haben Michael,
Yifan Cui
Abstract:
Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which essentially rules out unmeasured confounding of treatment assignment…
▽ More
Robins (1998) introduced marginal structural models (MSMs), a general class of counterfactual models for the joint effects of time-varying treatment regimes in complex longitudinal studies subject to time-varying confounding. He established identification of MSM parameters under a sequential randomization assumption (SRA), which essentially rules out unmeasured confounding of treatment assignment over time. In this technical report, we consider sufficient conditions for identification of MSM parameters with the aid of a time-varying instrumental variable, when sequential randomization fails to hold due to unmeasured confounding. Our identification conditions essentially require that no unobserved confounder predicts compliance type for the time-varying treatment, the longitudinal generalization of the identifying condition of Wang and Tchetgen Tchetgen (2018). Under this assumption, We derive a large class of semiparametric estimators that extends standard inverse-probability weighting (IPW), the most popular approach for estimating MSMs under SRA, by incorporating the time-varying IV through a modified set of weights. The set of influence functions for MSM parameters is derived under a semiparametric model with sole restriction on observed data distribution given by the MSM, and is shown to provide a rich class of multiply robust estimators, including a local semiparametric efficient estimator.
△ Less
Submitted 14 September, 2018;
originally announced September 2018.
-
Doubly Robust Regression Analysis for Data Fusion
Authors:
Katherine Evans,
BaoLuo Sun,
James Robins,
Eric J. Tchetgen Tchetgen
Abstract:
This paper investigates the problem of making inference about a parametric model for the regression of an outcome variable $Y$ on covariates $(V,L)$ when data are fused from two separate sources, one which contains information only on $(V, Y)$ while the other contains information only on covariates. This data fusion setting may be viewed as an extreme form of missing data in which the probability…
▽ More
This paper investigates the problem of making inference about a parametric model for the regression of an outcome variable $Y$ on covariates $(V,L)$ when data are fused from two separate sources, one which contains information only on $(V, Y)$ while the other contains information only on covariates. This data fusion setting may be viewed as an extreme form of missing data in which the probability of observing complete data $(V,L,Y)$ on any given subject is zero. We have developed a large class of semiparametric estimators, which includes doubly robust estimators, of the regression coefficients in fused data. The proposed method is DR in that it is consistent and asymptotically normal if, in addition to the model of interest, we correctly specify a model for either the data source process under an ignorability assumption, or the distribution of unobserved covariates. We evaluate the performance of our various estimators via an extensive simulation study, and apply the proposed methods to investigate the relationship between net asset value and total expenditure among U.S. households in 1998, while controlling for potential confounders including income and other demographic variables.
△ Less
Submitted 7 September, 2018; v1 submitted 22 August, 2018;
originally announced August 2018.
-
A general approach to detect gene (G)-environment (E) additive interaction leveraging G-E independence in case-control studies
Authors:
Eric J. Tchetgen Tchetgen,
Xu Shi,
Tamar Sofer,
Benedict H. W. Wong
Abstract:
It is increasingly of interest in statistical genetics to test for the presence of a mechanistic interaction between genetic (G) and environmental (E) risk factors by testing for the presence of an additive GxE interaction. In case-control studies involving a rare disease, a statistical test of no additive interaction typically entails a test of no relative excess risk due to interaction (RERI). I…
▽ More
It is increasingly of interest in statistical genetics to test for the presence of a mechanistic interaction between genetic (G) and environmental (E) risk factors by testing for the presence of an additive GxE interaction. In case-control studies involving a rare disease, a statistical test of no additive interaction typically entails a test of no relative excess risk due to interaction (RERI). It is also well known that a test of multiplicative interaction exploiting G-E independence can be dramatically more powerful than standard logistic regression for case-control data. Likewise, it has recently been shown that a likelihood ratio test of a null RERI incorporating the G-E independence assumption (RERI-LRT) outperforms the standard RERI approach. In this paper, the authors describe a general, yet relatively straightforward approach to test for GxE additive interaction exploiting G-E independence. The approach which relies on regression models for G and E is particularly attractive because, unlike the RERI-LRT, it allows the regression model for the binary outcome to remain unrestricted. Therefore, the new methodology is completely robust to possible mis-specification in the outcome regression. This is particularly important for settings not easily handled by RERI-LRT, such as when E is a count or a continuous exposure with multiple components, or when there are several auxiliary covariates in the regression model. While the proposed approach avoids fitting an outcome regression, it nonetheless still allows for straightforward covariate adjustment. The methods are illustrated through an extensive simulation study and an ovarian cancer empirical application.
△ Less
Submitted 17 August, 2018;
originally announced August 2018.
-
Multiply Robust Causal Inference with Double Negative Control Adjustment for Categorical Unmeasured Confounding
Authors:
Xu Shi,
Wang Miao,
Jennifer C. Nelson,
Eric J. Tchetgen Tchetgen
Abstract:
Unmeasured confounding is a threat to causal inference in observational studies. In recent years, use of negative controls to mitigate unmeasured confounding has gained increasing recognition and popularity. Negative controls have a longstanding tradition in laboratory sciences and epidemiology to rule out non-causal explanations, although they have been used primarily for bias detection. Recently…
▽ More
Unmeasured confounding is a threat to causal inference in observational studies. In recent years, use of negative controls to mitigate unmeasured confounding has gained increasing recognition and popularity. Negative controls have a longstanding tradition in laboratory sciences and epidemiology to rule out non-causal explanations, although they have been used primarily for bias detection. Recently, Miao et al. (2018) have described sufficient conditions under which a pair of negative control exposure and outcome variables can be used to nonparametrically identify the average treatment effect (ATE) from observational data subject to uncontrolled confounding. In this paper, we establish nonparametric identification of the ATE under weaker conditions in the case of categorical unmeasured confounding and negative control variables. We also provide a general semiparametric framework for obtaining inferences about the ATE while leveraging information about a possibly large number of measured covariates. In particular, we derive the semiparametric efficiency bound in the nonparametric model, and we propose multiply robust and locally efficient estimators when nonparametric estimation may not be feasible. We assess the finite sample performance of our methods in extensive simulation studies. Finally, we illustrate our methods with an application to the postlicensure surveillance of vaccine safety among children.
△ Less
Submitted 4 September, 2019; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Estimation of natural indirect effects robust to unmeasured confounding and mediator measurement error
Authors:
Isabel R. Fulcher,
Xu Shi,
Eric J. Tchetgen Tchetgen
Abstract:
The use of causal mediation analysis to evaluate the pathways by which an exposure affects an outcome is widespread in the social and biomedical sciences. Recent advances in this area have established formal conditions for identification and estimation of natural direct and indirect effects. However, these conditions typically involve stringent no unmeasured confounding assumptions and that the me…
▽ More
The use of causal mediation analysis to evaluate the pathways by which an exposure affects an outcome is widespread in the social and biomedical sciences. Recent advances in this area have established formal conditions for identification and estimation of natural direct and indirect effects. However, these conditions typically involve stringent no unmeasured confounding assumptions and that the mediator has been measured without error. These assumptions may fail to hold in practice where mediation methods are often applied. The goal of this paper is two-fold. First, we show that the natural indirect effect can in fact be identified in the presence of unmeasured exposure-outcome confounding provided there is no additive interaction between the mediator and unmeasured confounder(s). Second, we introduce a new estimator of the natural indirect effect that is robust to both classical measurement error of the mediator and unmeasured confounding of both exposure-outcome and mediator-outcome relations under certain no interaction assumptions. We provide formal proofs and a simulation study to demonstrate our results.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
A causal framework for classical statistical estimands in failure time settings with competing events
Authors:
Jessica G. Young,
Mats J. Stensrud,
Eric J. Tchetgen Tchetgen,
Miguel A. Hernán
Abstract:
In failure-time settings, a competing risk event is any event that makes it impossible for the event of interest to occur. For example, cardiovascular disease death is a competing event for prostate cancer death because an individual cannot die of prostate cancer once he has died of cardiovascular disease. Various statistical estimands have been defined as possible targets of inference in the clas…
▽ More
In failure-time settings, a competing risk event is any event that makes it impossible for the event of interest to occur. For example, cardiovascular disease death is a competing event for prostate cancer death because an individual cannot die of prostate cancer once he has died of cardiovascular disease. Various statistical estimands have been defined as possible targets of inference in the classical competing risks literature. Many reviews have described these statistical estimands and their estimating procedures with recommendations about their use. However, this previous work has not used a formal framework for characterizing causal effects and their identifying conditions, which makes it difficult to interpret effect estimates and assess recommendations regarding analytic choices. Here we use a counterfactual framework to explicitly define each of these classical estimands. We clarify that, depending on whether competing events are defined as censoring events, contrasts of risks can define a total effect of the treatment on the event of interest, or a direct effect of the treatment on the event of interest not mediated through the competing event. In contrast, regardless of whether competing events are defined as censoring events, counterfactual hazard contrasts cannot generally be interpreted as causal effects. We illustrate how identifying assumptions for all of these counterfactual estimands can be represented in causal diagrams in which competing events are depicted as time-varying covariates. We present an application of these ideas to data from a randomized trial designed to estimate the effect of estrogen therapy on prostate cancer mortality.
△ Less
Submitted 6 November, 2019; v1 submitted 15 June, 2018;
originally announced June 2018.
-
Robust inference on population indirect causal effects: the generalized front-door criterion
Authors:
Isabel R. Fulcher,
Ilya Shpitser,
Stella Marealle,
Eric J. Tchetgen Tchetgen
Abstract:
Standard methods for inference about direct and indirect effects require stringent no unmeasured confounding assumptions which often fail to hold in practice, particularly in observational studies. The goal of this paper is to introduce a new form of indirect effect, the population intervention indirect effect (PIIE), that can be nonparametrically identified in the presence of an unmeasured common…
▽ More
Standard methods for inference about direct and indirect effects require stringent no unmeasured confounding assumptions which often fail to hold in practice, particularly in observational studies. The goal of this paper is to introduce a new form of indirect effect, the population intervention indirect effect (PIIE), that can be nonparametrically identified in the presence of an unmeasured common cause of exposure and outcome. This new type of indirect effect captures the extent to which the effect of exposure is mediated by an intermediate variable under an intervention that holds the component of exposure directly influencing the outcome at its observed value. The PIIE is in fact the indirect component of the population intervention effect, introduced by Hubbard and Van der Laan (2008). Interestingly, our identification criterion generalizes Judea Pearl's front-door criterion as it does not require no direct effect of exposure not mediated by the intermediate variable. For inference, we develop both parametric and semiparametric methods, including a novel doubly robust semiparametric locally efficient estimator, that perform very well in simulation studies. Finally, the proposed methods are used to measure the effectiveness of monetary saving recommendations among women enrolled in a maternal health program in Tanzania.
△ Less
Submitted 23 September, 2019; v1 submitted 9 November, 2017;
originally announced November 2017.
-
On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding
Authors:
Caleb H. Miles,
Ilya Shpitser,
Phyllis Kanki,
Seema Meloni,
Eric J. Tchetgen Tchetgen
Abstract:
Path-specific effects are a broad class of mediated effects from an exposure to an outcome via one or more causal pathways with respect to some subset of intermediate variables. The majority of the literature concerning estimation of mediated effects has focused on parametric models with stringent assumptions regarding unmeasured confounding. We consider semiparametric inference of a path-specific…
▽ More
Path-specific effects are a broad class of mediated effects from an exposure to an outcome via one or more causal pathways with respect to some subset of intermediate variables. The majority of the literature concerning estimation of mediated effects has focused on parametric models with stringent assumptions regarding unmeasured confounding. We consider semiparametric inference of a path-specific effect when these assumptions are relaxed. In particular, we develop a suite of semiparametric estimators for the effect along a pathway through a mediator, but not some exposure-induced confounder of that mediator. These estimators have different robustness properties, as each depends on different parts of the observed data likelihood. One of our estimators may be viewed as combining the others, because it is locally semiparametric efficient and multiply robust. The latter property is illustrated in a simulation study. We apply our methodology to an HIV study, in which we estimate the effect comparing two drug treatments on a patient's average log CD4 count mediated by the patient's level of adherence, but not by previous experience of toxicity, which is clearly affected by which treatment the patient is assigned to, and may confound the effect of the patient's level of adherence on their virologic outcome.
△ Less
Submitted 3 October, 2017;
originally announced October 2017.
-
The GENIUS Approach to Robust Mendelian Randomization Inference
Authors:
Eric J. Tchetgen Tchetgen,
BaoLuo Sun,
Stefan Walter
Abstract:
Mendelian randomization (MR) is a popular instrumental variable (IV) approach, in which one or several genetic markers serve as IVs that can sometimes be leveraged to recover valid inferences about a given exposure-outcome causal association subject to unmeasured confounding. A key IV identification condition known as the exclusion restriction states that the IV cannot have a direct effect on the…
▽ More
Mendelian randomization (MR) is a popular instrumental variable (IV) approach, in which one or several genetic markers serve as IVs that can sometimes be leveraged to recover valid inferences about a given exposure-outcome causal association subject to unmeasured confounding. A key IV identification condition known as the exclusion restriction states that the IV cannot have a direct effect on the outcome which is not mediated by the exposure in view. In MR studies, such an assumption requires an unrealistic level of prior knowledge about the mechanism by which genetic markers causally affect the outcome. As a result, possible violation of the exclusion restriction can seldom be ruled out in practice. To address this concern, we introduce a new class of IV estimators which are robust to violation of the exclusion restriction under data generating mechanisms commonly assumed in MR literature. The proposed approach named "MR G-Estimation under No Interaction with Unmeasured Selection" (MR GENIUS) improves on Robins' G-estimation by making it robust to both additive unmeasured confounding and violation of the exclusion restriction assumption. In certain key settings, MR GENIUS reduces to the estimator of Lewbel (2012) which is widely used in econometrics but appears largely unappreciated in MR literature. More generally, MR GENIUS generalizes Lewbel's estimator to several key practical MR settings, including multiplicative causal models for binary outcome, multiplicative and odds ratio exposure models, case control study design and censored survival outcomes.
△ Less
Submitted 2 June, 2019; v1 submitted 22 September, 2017;
originally announced September 2017.
-
Auto-G-Computation of Causal Effects on a Network
Authors:
Eric J. Tchetgen Tchetgen,
Isabel Fulcher,
Ilya Shpitser
Abstract:
Methods for inferring average causal effects have traditionally relied on two key assumptions: (i) the intervention received by one unit cannot causally influence the outcome of another; and (ii) units can be organized into non-overlapping groups such that outcomes of units in separate groups are independent. In this paper, we develop new statistical methods for causal inference based on a single…
▽ More
Methods for inferring average causal effects have traditionally relied on two key assumptions: (i) the intervention received by one unit cannot causally influence the outcome of another; and (ii) units can be organized into non-overlapping groups such that outcomes of units in separate groups are independent. In this paper, we develop new statistical methods for causal inference based on a single realization of a network of connected units for which neither assumption (i) nor (ii) holds. The proposed approach allows both for arbitrary forms of interference, whereby the outcome of a unit may depend on interventions received by other units with whom a network path through connected units exists; and long range dependence, whereby outcomes for any two units likewise connected by a path in the network may be dependent. Under network versions of consistency and no unobserved confounding, inference is made tractable by an assumption that the network's outcome, treatment and covariate vectors are a single realization of a certain chain graph model. This assumption allows inferences about various network causal effects via the auto-g-computation algorithm, a network generalization of Robins' well-known g-computation algorithm previously described for causal inference under assumptions (i) and (ii).
△ Less
Submitted 22 August, 2019; v1 submitted 5 September, 2017;
originally announced September 2017.
-
A Class of Semiparametric Tests of Treatment Effect Robust to Confounder Classical Measurement Error
Authors:
Caleb H. Miles,
Joel Schwartz,
Eric J. Tchetgen Tchetgen
Abstract:
When assessing the presence of an exposure causal effect on a given outcome, it is well known that classical measurement error of the exposure can reduce the power of a test of the null hypothesis in question, although its type I error rate will generally remain at the nominal level. In contrast, classical measurement error of a confounder can inflate the type I error rate of a test of treatment e…
▽ More
When assessing the presence of an exposure causal effect on a given outcome, it is well known that classical measurement error of the exposure can reduce the power of a test of the null hypothesis in question, although its type I error rate will generally remain at the nominal level. In contrast, classical measurement error of a confounder can inflate the type I error rate of a test of treatment effect. In this paper, we develop a large class of semiparametric test statistics of an exposure causal effect, which are completely robust to classical measurement error of a subset of confounders. A unique and appealing feature of our proposed methods is that they require no external information such as validation data or replicates of error-prone confounders. We present a doubly-robust form of this test that requires only one of two models to be correctly specified for the resulting test statistic to have correct type I error rate. We demonstrate validity and power within our class of test statistics through simulation studies. We apply the methods to a multi-U.S.-city, time-series data set to test for an effect of temperature on mortality while adjusting for atmospheric particulate matter with diameter of 2.5 micrometres or less (PM2.5), which is known to be measured with error.
△ Less
Submitted 17 October, 2016;
originally announced October 2016.
-
Instrumental variables estimation of exposure effects on a time-to-event response using structural cumulative survival models
Authors:
T. Martinussen,
S. Vansteelandt,
E. J. Tchetgen Tchetgen,
D. M. Zucker
Abstract:
The use of instrumental variables for estimating the effect of an exposure on an outcome is popular in econometrics, and increasingly so in epidemiology. This increasing popularity may be attributed to the natural occurrence of instrumental variables in observational studies that incorporate elements of randomization, either by design or by nature (e.g., random inheritance of genes). Instrumental…
▽ More
The use of instrumental variables for estimating the effect of an exposure on an outcome is popular in econometrics, and increasingly so in epidemiology. This increasing popularity may be attributed to the natural occurrence of instrumental variables in observational studies that incorporate elements of randomization, either by design or by nature (e.g., random inheritance of genes). Instrumental variables estimation of exposure effects is well established for continuous outcomes and to some extent for binary outcomes. It is, however, largely lacking for time-to-event outcomes because of complications due to censoring and survivorship bias. In this paper, we make a novel proposal under a class of structural cumulative survival models which parameterize time-varying effects of a point exposure directly on the scale of the survival function; these models are essentially equivalent with a semi-parametric variant of the instrumental variables additive hazards model. We propose a class of recursive instrumental variable estimators for these exposure effects, and derive their large sample properties along with inferential tools. We examine the performance of the proposed method in simulation studies and illustrate it in a Mendelian randomization study to evaluate the effect of diabetes on mortality using data from the Health and Retirement Study. We further use the proposed method to investigate potential benefit from breast cancer screening on subsequent breast cancer mortality based on the HIP-study.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference
Authors:
Eric J. Tchetgen Tchetgen,
Linbo Wang,
BaoLuo Sun
Abstract:
Nonmonotone missing data arise routinely in empirical studies of social and health sciences, and when ignored, can induce selection bias and loss of efficiency. In practice, it is common to account for nonresponse under a missing-at-random assumption which although convenient, is rarely appropriate when nonresponse is nonmonotone. Likelihood and Bayesian missing data methodologies often require sp…
▽ More
Nonmonotone missing data arise routinely in empirical studies of social and health sciences, and when ignored, can induce selection bias and loss of efficiency. In practice, it is common to account for nonresponse under a missing-at-random assumption which although convenient, is rarely appropriate when nonresponse is nonmonotone. Likelihood and Bayesian missing data methodologies often require specification of a parametric model for the full data law, thus a priori ruling out any prospect for semiparametric inference. In this paper, we propose an all-purpose approach which delivers semiparametric inferences when missing data are nonmonotone and not at random. The approach is based on a discrete choice model (DCM) as a means to generate a large class of nonmonotone nonresponse mechanisms that are nonignorable. Sufficient conditions for nonparametric identification are given, and a general framework for fully parametric and semiparametric inference under an arbitrary DCM is proposed. Special consideration is given to the case of logit discrete choice nonresponse model (LDCM) for which we describe generalizations of inverse-probability weighting, pattern-mixture estimation, doubly robust estimation and multiply robust estimation.
△ Less
Submitted 19 July, 2017; v1 submitted 9 July, 2016;
originally announced July 2016.
-
On Partial Identification of the Pure Direct Effect
Authors:
Caleb H. Miles,
Phyllis Kanki,
Seema Meloni,
Eric J. Tchetgen Tchetgen
Abstract:
In causal mediation analysis, nonparametric identification of the pure (natural) direct effect typically relies on, in addition to no unobserved pre-exposure confounding, fundamental assumptions of (i) so-called "cross-world-counterfactuals" independence and (ii) no exposure- induced confounding. When the mediator is binary, bounds for partial identification have been given when neither assumption…
▽ More
In causal mediation analysis, nonparametric identification of the pure (natural) direct effect typically relies on, in addition to no unobserved pre-exposure confounding, fundamental assumptions of (i) so-called "cross-world-counterfactuals" independence and (ii) no exposure- induced confounding. When the mediator is binary, bounds for partial identification have been given when neither assumption is made, or alternatively when assuming only (ii). We extend existing bounds to the case of a polytomous mediator, and provide bounds for the case assuming only (i). We apply these bounds to data from the Harvard PEPFAR program in Nigeria, where we evaluate the extent to which the effects of antiretroviral therapy on virological failure are mediated by a patient's adherence, and show that inference on this effect is somewhat sensitive to model assumptions.
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Interference and Sensitivity Analysis
Authors:
Tyler J. VanderWeele,
Eric J. Tchetgen Tchetgen,
M. Elizabeth Halloran
Abstract:
Causal inference with interference is a rapidly growing area. The literature has begun to relax the "no-interference" assumption that the treatment received by one individual does not affect the outcomes of other individuals. In this paper we briefly review the literature on causal inference in the presence of interference when treatments have been randomized. We then consider settings in which ca…
▽ More
Causal inference with interference is a rapidly growing area. The literature has begun to relax the "no-interference" assumption that the treatment received by one individual does not affect the outcomes of other individuals. In this paper we briefly review the literature on causal inference in the presence of interference when treatments have been randomized. We then consider settings in which causal effects in the presence of interference are not identified, either because randomization alone does not suffice for identification or because treatment is not randomized and there may be unmeasured confounders of the treatment-outcome relationship. We develop sensitivity analysis techniques for these settings. We describe several sensitivity analysis techniques for the infectiousness effect which, in a vaccine trial, captures the effect of the vaccine of one person on protecting a second person from infection even if the first is infected. We also develop two sensitivity analysis techniques for causal effects under interference in the presence of unmeasured confounding which generalize analogous techniques when interference is absent. These two techniques for unmeasured confounding are compared and contrasted.
△ Less
Submitted 5 March, 2015;
originally announced March 2015.
-
On Inverse Probability Weighting for Nonmonotone Missing at Random Data
Authors:
BaoLuo Sun,
Eric J. Tchetgen Tchetgen
Abstract:
The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone missing data settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the under…
▽ More
The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone missing data settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which can be easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a Bayesian constrained approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of the standard IPW estimator is improved by incorporating information from incomplete cases through an augmented estimating equation which is optimal within a large class of estimating equations. We investigate the finite-sample properties of the proposed estimators in a simulation study and illustrate the new methodology in an application evaluating key correlates of preterm delivery for infants born to HIV infected mothers in Botswana, Africa.
△ Less
Submitted 17 October, 2015; v1 submitted 19 November, 2014;
originally announced November 2014.
-
Control Function Assisted IPW Estimation with a Secondary Outcome in Case-Control Studies
Authors:
Tamar Sofer,
Marilyn C. Cornelis,
Peter Kraft,
Eric J. Tchetgen Tchetgen
Abstract:
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to esti…
▽ More
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to estimate population effects in such studies. However, these estimators are inefficient relative to estimators that make additional assumptions about the data generating mechanism. We propose a class of estimators for the effect of risk factors on a secondary outcome in case-control studies, when the mean is modeled using either the identity or the log link. The proposed estimator combines IPW with a mean zero control function that depends explicitly on a model for the primary disease outcome. The efficient estimator in our class of estimators reduces to standard IPW when the model for the primary disease outcome is unrestricted, and is more efficient than standard IPW when the model is either parametric or semiparametric.
△ Less
Submitted 16 July, 2014;
originally announced July 2014.
-
Flexible covariate-adjusted exact tests of randomized treatment effects with application to a trial of HIV education
Authors:
Alisa J. Stephens,
Eric J. Tchetgen Tchetgen,
Victor De Gruttola
Abstract:
The primary goal of randomized trials is to compare the effects of different interventions on some outcome of interest. In addition to the treatment assignment and outcome, data on baseline covariates, such as demographic characteristics or biomarker measurements, are typically collected. Incorporating such auxiliary covariates in the analysis of randomized trials can increase power, but questions…
▽ More
The primary goal of randomized trials is to compare the effects of different interventions on some outcome of interest. In addition to the treatment assignment and outcome, data on baseline covariates, such as demographic characteristics or biomarker measurements, are typically collected. Incorporating such auxiliary covariates in the analysis of randomized trials can increase power, but questions remain about how to preserve type I error when incorporating such covariates in a flexible way, particularly when the number of randomized units is small. Using the Young Citizens study, a cluster-randomized trial of an educational intervention to promote HIV awareness, we compare several methods to evaluate intervention effects when baseline covariates are incorporated adaptively. To ascertain the validity of the methods shown in small samples, extensive simulation studies were conducted. We demonstrate that randomization inference preserves type I error under model selection while tests based on asymptotic theory may yield invalid results. We also demonstrate that covariate adjustment generally increases power, except at extremely small sample sizes using liberal selection procedures. Although shown within the context of HIV prevention research, our conclusions have important implications for maximizing efficiency and robustness in randomized trials with small samples across disciplines.
△ Less
Submitted 8 January, 2014;
originally announced January 2014.