Search | arXiv e-print repository

arXiv:2508.18137 [pdf, ps, other]

Estimating the average treatment effect in cluster-randomized trials with misclassified outcomes and non-random validation subsets

Authors: Dane Isenberg, Nandita Mitra, Steven C. Marcus, Rinad S. Beidas, Kristin A. Linn

Abstract: Randomized trials are viewed as the benchmark for assessing causal effects of treatments on outcomes of interest. Nonetheless, challenges such as measurement error can undermine the standard causal assumptions for randomized trials. In ASPIRE, a cluster-randomized trial, pediatric primary care clinics were assigned to one of two treatments aimed at promoting clinician delivery of a secure firearm… ▽ More Randomized trials are viewed as the benchmark for assessing causal effects of treatments on outcomes of interest. Nonetheless, challenges such as measurement error can undermine the standard causal assumptions for randomized trials. In ASPIRE, a cluster-randomized trial, pediatric primary care clinics were assigned to one of two treatments aimed at promoting clinician delivery of a secure firearm program to parents during well-child visits. A key outcome of interest is thus parent receipt of the program at each visit. Clinicians documented program delivery in patients' electronic health records for all visits, but their reporting is a proxy measure for the parent receipt outcome. Parents were also surveyed to report directly on program receipt after their child's visit; however, only a small subset of them completed the survey. Here, we develop a causal inference framework for a binary outcome that is subject to misclassification through silver-standard measures (clinician reports), but gold-standard measures (parent reports) are only available for a non-random internal validation subset. We propose a method for identifying the average treatment effect (ATE) that addresses the risk of bias due to misclassification and non-random validation selection, even when the outcome (parent receipt) may directly impact selection propensity (survey responsiveness). We show that ATE estimation relies on specifying the relationship between the gold- and silver-standard outcome measures in the validation subset, which may depend on treatment and covariates. Additionally, the clustered design is reflected in our causal assumptions and in our cluster-robust approach to estimation of the ATE. Simulation studies demonstrate acceptable finite-sample operating characteristics of our ATE estimator, supporting its application to ASPIRE. △ Less

Submitted 26 August, 2025; v1 submitted 25 August, 2025; originally announced August 2025.

Comments: corrected very minor typos in tables/figs and one small change to abstract

arXiv:2508.02970 [pdf, ps, other]

Bayesian Sensitivity Analyses for Policy Evaluation with Difference-in-Differences under Violations of Parallel Trends

Authors: Seong Woo Han, Nandita Mitra, Gary Hettinger, Arman Oganisian

Abstract: Violations of the parallel trends assumption pose significant challenges for causal inference in difference-in-differences (DiD) studies, especially in policy evaluations where pre-treatment dynamics and external shocks may bias estimates. In this work, we propose a Bayesian DiD framework to allow us to estimate the effect of policies when parallel trends is violated. To address potential deviatio… ▽ More Violations of the parallel trends assumption pose significant challenges for causal inference in difference-in-differences (DiD) studies, especially in policy evaluations where pre-treatment dynamics and external shocks may bias estimates. In this work, we propose a Bayesian DiD framework to allow us to estimate the effect of policies when parallel trends is violated. To address potential deviations from the parallel trends assumption, we introduce a formal sensitivity parameter representing the extent of the violation, specify an autoregressive AR(1) prior on this term to robustly model temporal correlation, and explore a range of prior specifications - including fixed, fully Bayesian, and empirical Bayes (EB) approaches calibrated from pre-treatment data. By systematically comparing posterior treatment effect estimates across prior configurations when evaluating Philadelphia's sweetened beverage tax using Baltimore as a control, we show how Bayesian sensitivity analyses support robust and interpretable policy conclusions under violations of parallel trends. △ Less

Submitted 4 August, 2025; originally announced August 2025.

arXiv:2505.21447 [pdf, ps, other]

A Bayesian approach to the survivor average causal effect in cluster-randomized crossover trials

Authors: Dane Isenberg, Michael O. Harhay, Andrew B. Forbes, Paul J. Young, Fan Li, Nandita Mitra

Abstract: In cluster-randomized crossover (CRXO) trials, groups of individuals are randomly assigned to two or more sequences of alternating treatments. Since clusters act as their own control, the CRXO design is typically more statistically efficient than the usual parallel-arm trial. CRXO trials are increasingly popular in many areas of health research where the number of available clusters is limited. Fu… ▽ More In cluster-randomized crossover (CRXO) trials, groups of individuals are randomly assigned to two or more sequences of alternating treatments. Since clusters act as their own control, the CRXO design is typically more statistically efficient than the usual parallel-arm trial. CRXO trials are increasingly popular in many areas of health research where the number of available clusters is limited. Further, in trials among severely ill patients, researchers often want to assess the effect of treatments on secondary non-terminal outcomes, but frequently in these studies, there are patients who do not survive to have these measurements fully recorded. In this paper, we provide a causal inference framework and treatment effect estimation methods for addressing truncation by death in the setting of CRXO trials. We target the survivor average causal effect (SACE) estimand, a well-defined subgroup treatment effect obtained via principal stratification. We propose novel structural and standard modeling assumptions to enable SACE identification followed by estimation within a Bayesian paradigm. We evaluate the small-sample performance of our proposed Bayesian approach for the estimation of the SACE in CRXO trial settings via simulation studies. We apply our methods to a previously conducted two-period cross-sectional CRXO study examining the impact of proton pump inhibitors compared to histamine-2 receptor blockers on length of hospitalization among adults requiring invasive mechanical ventilation. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2408.16670 [pdf, ps, other]

A Causal Framework for Evaluating Drivers of Policy Effect Heterogeneity Using Difference-in-Differences

Authors: Gary Hettinger, Youjin Lee, Nandita Mitra

Abstract: Policymakers and researchers often seek to understand how a policy differentially affects a population and the pathways driving this heterogeneity. For example, when studying an excise tax on sweetened beverages, researchers might assess the roles of cross-border shopping, economic competition, and store-level price changes on beverage sales trends. However, traditional policy evaluation tools, li… ▽ More Policymakers and researchers often seek to understand how a policy differentially affects a population and the pathways driving this heterogeneity. For example, when studying an excise tax on sweetened beverages, researchers might assess the roles of cross-border shopping, economic competition, and store-level price changes on beverage sales trends. However, traditional policy evaluation tools, like the difference-in-differences (DiD) approach, primarily target average effects of the observed intervention rather than the underlying drivers of effect heterogeneity. Common approaches to evaluate sources of heterogeneity often lack a causal framework, making it difficult to determine whether observed outcome differences are truly driven by the proposed source of heterogeneity or by other confounding factors. In this paper, we present a framework for evaluating such policy drivers by representing questions of effect heterogeneity under hypothetical interventions and use it to evaluate drivers of the Philadelphia sweetened beverage tax policy effects. Building on recent advancements in estimating causal effect curves under DiD designs, we provide tools to assess policy effect heterogeneity while addressing practical challenges including confounding and neighborhood dynamics. △ Less

Submitted 12 March, 2025; v1 submitted 29 August, 2024; originally announced August 2024.

arXiv:2407.15525 [pdf, other]

Multiple Importance Sampling for Stochastic Gradient Estimation

Authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh

Abstract: We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailo… ▽ More We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailored to specific parameter gradients. This approach facilitates the importance sampling of vector-valued gradient estimation. Rather than naively combining multiple distributions, our framework involves optimally weighting data contribution across multiple distributions. This adapted combination of multiple importance yields superior gradient estimates, leading to faster training convergence. We demonstrate the effectiveness of our approach through empirical evaluations across a range of optimization tasks like classification and regression on both image and point cloud datasets. △ Less

Submitted 28 January, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 13 pages, 11 figures

arXiv:2404.10629 [pdf, other]

Weighting methods for truncation by death in cluster-randomized trials

Authors: Dane Isenberg, Michael Harhay, Nandita Mitra, Fan Li

Abstract: Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor aver… ▽ More Patient-centered outcomes, such as quality of life and length of hospital stay, are the focus in a wide array of clinical studies. However, participants in randomized trials for elderly or critically and severely ill patient populations may have truncated or undefined non-mortality outcomes if they do not survive through the measurement time point. To address truncation by death, the survivor average causal effect (SACE) has been proposed as a causally interpretable subgroup treatment effect defined under the principal stratification framework. However, the majority of methods for estimating SACE have been developed in the context of individually-randomized trials. Only limited discussions have been centered around cluster-randomized trials (CRTs), where methods typically involve strong distributional assumptions for outcome modeling. In this paper, we propose two weighting methods to estimate SACE in CRTs that obviate the need for potentially complicated outcome distribution modeling. We establish the requisite assumptions that address latent clustering effects to enable point identification of SACE, and we provide computationally-efficient asymptotic variance estimators for each weighting estimator. In simulations, we evaluate our weighting estimators, demonstrating their finite-sample operating characteristics and robustness to certain departures from the identification assumptions. We illustrate our methods using data from a CRT to assess the impact of a sedation protocol on mechanical ventilation among children with acute respiratory failure. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Code for simulations and R package is available on https://github.com/abcdane1/PtSaceCrts

arXiv:2401.14355 [pdf, ps, other]

doi 10.1093/biomtc/ujaf015

Multiply Robust Difference-in-Differences Estimation of Causal Effect Curves for Continuous Exposures

Authors: Gary Hettinger, Youjin Lee, Nandita Mitra

Abstract: Researchers commonly use difference-in-differences (DiD) designs to evaluate public policy interventions. While methods exist for estimating effects in the context of binary interventions, policies often result in varied exposures across regions implementing the policy. Yet, existing approaches for incorporating continuous exposures face substantial limitations in addressing confounding variables… ▽ More Researchers commonly use difference-in-differences (DiD) designs to evaluate public policy interventions. While methods exist for estimating effects in the context of binary interventions, policies often result in varied exposures across regions implementing the policy. Yet, existing approaches for incorporating continuous exposures face substantial limitations in addressing confounding variables associated with intervention status, exposure levels, and outcome trends. These limitations significantly constrain policymakers' ability to fully comprehend policy impacts and design future interventions. In this work, we propose new estimators for causal effect curves within the DiD framework, accounting for multiple sources of confounding. Our approach accommodates misspecification of a subset of treatment, exposure, and outcome models while avoiding any parametric assumptions on the effect curve. We present the statistical properties of the proposed methods and illustrate their application through simulations and a study investigating the heterogeneous effects of a nutritional excise tax under different levels of accessibility to cross-border shopping. △ Less

Submitted 15 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2303.06227 [pdf, other]

Policy effect evaluation under counterfactual neighborhood interventions in the presence of spillover

Authors: Youjin Lee, Gary Hettinger, Nandita Mitra

Abstract: Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by as… ▽ More Policy interventions can spill over to units of a population that are not directly exposed to the policy but are geographically close to the units receiving the intervention. In recent work, investigations of spillover effects on neighboring regions have focused on estimating the average treatment effect of a particular policy in an observed setting. Our research question broadens this scope by asking what policy consequences would the treated units have experienced under hypothetical exposure settings. When we only observe treated unit(s) surrounded by controls -- as is common when a policy intervention is implemented in a single city or state -- this effect inquires about the policy effects under a counterfactual neighborhood policy status that we do not, in actuality, observe. In this work, we extend difference-in-differences (DiD) approaches to spillover settings and develop identification conditions required to evaluate policy effects in counterfactual treatment scenarios. These causal quantities are policy-relevant for designing effective policies for populations subject to various neighborhood statuses. We develop doubly robust estimators and use extensive numerical experiments to examine their performance under heterogeneous spillover effects. We apply our proposed method to investigate the effect of the Philadelphia beverage tax on unit sales. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2301.06697 [pdf, ps, other]

doi 10.1093/jrsssc/qlae066

Estimation of Policy-Relevant Causal Effects in the Presence of Interference with an Application to the Philadelphia Beverage Tax

Authors: Gary Hettinger, Christina Roberto, Youjin Lee, Nandita Mitra

Abstract: To comprehensively evaluate a public policy intervention, researchers must consider the effects of the policy not just on the implementing region, but also nearby, indirectly-affected regions. For example, an excise tax on sweetened beverages in Philadelphia was shown to not only be associated with a decrease in volume sales of taxed beverages in Philadelphia, but also an increase in sales in bord… ▽ More To comprehensively evaluate a public policy intervention, researchers must consider the effects of the policy not just on the implementing region, but also nearby, indirectly-affected regions. For example, an excise tax on sweetened beverages in Philadelphia was shown to not only be associated with a decrease in volume sales of taxed beverages in Philadelphia, but also an increase in sales in bordering counties not subject to the tax. The latter association may be explained by cross-border shopping behaviors of Philadelphia residents and indicate a causal effect of the tax on nearby regions, which may offset the total effect of the intervention. To estimate causal effects in this setting, we extend difference-in-differences methodology to account for such interference between regions and adjust for potential confounding present in quasi-experimental evaluations. Our doubly robust estimators for the average treatment effect on the treated and neighboring control relax standard assumptions on interference and model specification. We apply these methods to evaluate the change in volume sales of taxed beverages in 231 Philadelphia and bordering county stores due to the Philadelphia beverage tax. We also use our methods to explore the heterogeneity of effects across geographic features. △ Less

Submitted 1 February, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Journal ref: Journal of the Royal Statistical Society Series C (2024)

arXiv:2205.05624 [pdf, other]

Leveraging baseline covariates to analyze small cluster-randomized trials with a rare binary outcome

Authors: Angela Y. Zhu, Nandita Mitra, Karla Hemming, Michael O. Harhay, Fan Li

Abstract: Cluster-randomized trials (CRTs) involve randomizing entire groups of participants -- called clusters -- to treatment arms but are often comprised of a limited or fixed number of available clusters. While covariate adjustment can account for chance imbalances between treatment arms and increase statistical efficiency in individually-randomized trials, analytical methods for individual-level covari… ▽ More Cluster-randomized trials (CRTs) involve randomizing entire groups of participants -- called clusters -- to treatment arms but are often comprised of a limited or fixed number of available clusters. While covariate adjustment can account for chance imbalances between treatment arms and increase statistical efficiency in individually-randomized trials, analytical methods for individual-level covariate adjustment in small CRTs have received little attention to date. In this paper, we systematically investigate, through extensive simulations, the operating characteristics of propensity score weighting and multivariable regression as two individual-level covariate adjustment strategies for estimating the participant-average causal effect in small CRTs with a rare binary outcome and identify scenarios where each adjustment strategy has a relative efficiency advantage over the other to make practical recommendations. We also examine the finite-sample performance of the bias-corrected sandwich variance estimators associated with propensity score weighting and multivariable regression for quantifying the uncertainty in estimating the participant-average treatment effect. To illustrate the methods for individual-level covariate adjustment, we reanalyze a recent CRT testing a sedation protocol in 31 pediatric intensive care units. △ Less

Submitted 28 November, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:2110.10266 [pdf, other]

Addressing Positivity Violations in Causal Effect Estimation using Gaussian Process Priors

Authors: Yaqian Zhu, Nandita Mitra, Jason Roy

Abstract: In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects, i.e., the covariate distributions should overlap between treatment… ▽ More In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects, i.e., the covariate distributions should overlap between treatment arms. If the positivity assumption is violated, population-level causal inference necessarily involves some extrapolation. Ideally, a greater amount of uncertainty about the causal effect estimate should be reflected in such situations. With that goal in mind, we construct a Gaussian process model for estimating treatment effects in the presence of practical violations of positivity. Advantages of our method include minimal distributional assumptions, a cohesive model for estimating treatment effects, and more uncertainty associated with areas in the covariate space where there is less overlap. We assess the performance of our approach with respect to bias and efficiency using simulation studies. The method is then applied to a study of critically ill female patients to examine the effect of undergoing right heart catheterization. △ Less

Submitted 17 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

arXiv:2108.08756 [pdf, other]

Combining Real-World and Randomized Control Trial Data Using Data-Adaptive Weighting via the On-Trial Score

Authors: Joanna Harton, Brian Segal, Ronac Mamtani, Nandita Mitra, Rebecca Hubbard

Abstract: Clinical trials with a hybrid control arm (a control arm constructed from a combination of randomized patients and real-world data on patients receiving usual care in standard clinical practice) have the potential to decrease the cost of randomized trials while increasing the proportion of trial patients given access to novel therapeutics. However, due to stringent trial inclusion criteria and dif… ▽ More Clinical trials with a hybrid control arm (a control arm constructed from a combination of randomized patients and real-world data on patients receiving usual care in standard clinical practice) have the potential to decrease the cost of randomized trials while increasing the proportion of trial patients given access to novel therapeutics. However, due to stringent trial inclusion criteria and differences in care and data quality between trials and community practice, trial patients may have systematically different outcomes compared to their real-world counterparts. We propose a new method for analyses of trials with a hybrid control arm that efficiently controls bias and type I error. Under our proposed approach, selected real-world patients are weighted by a function of the "on-trial score," which reflects their similarity to trial patients. In contrast to previously developed hybrid control designs that assign the same weight to all real-world patients, our approach upweights of real-world patients who more closely resemble randomized control patients while dissimilar patients are discounted. Estimates of the treatment effect are obtained via Cox proportional hazards models. We compare our approach to existing approaches via simulations and apply these methods to a study using electronic health record data. Our proposed method is able to control type I error, minimize bias, and decrease variance when compared to using only trial data in nearly all scenarios examined. Therefore, our new approach can be used when conducting clinical trials by augmenting the standard-of-care arm with weighted patients from the EHR to increase power without inducing bias. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: Presented at JSM 2020, ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop 2020, submitted to Pharmaceutical Statistics on 8/17/21

arXiv:2107.03441 [pdf, other]

Identifying optimally cost-effective dynamic treatment regimes with a Q-learning approach

Authors: Nicholas Illenberger, Andrew J. Spieker, Nandita Mitra

Abstract: Health policy decisions regarding patient treatment strategies require consideration of both treatment effectiveness and cost. Optimizing treatment rules with respect to effectiveness may result in prohibitively expensive strategies; on the other hand, optimizing with respect to costs may result in poor patient outcomes. We propose a two-step approach for identifying an optimally cost-effective an… ▽ More Health policy decisions regarding patient treatment strategies require consideration of both treatment effectiveness and cost. Optimizing treatment rules with respect to effectiveness may result in prohibitively expensive strategies; on the other hand, optimizing with respect to costs may result in poor patient outcomes. We propose a two-step approach for identifying an optimally cost-effective and interpretable dynamic treatment regime. First, we develop a combined Q-learning and policy-search approach to estimate an optimal list-based regime under a constraint on expected treatment costs. Second, we propose an iterative procedure to select an optimally cost-effective regime from a set of candidate regimes corresponding to different cost constraints. Our approach can estimate optimal regimes in the presence of time-varying confounding, censoring, and correlated outcomes. Through simulation studies, we illustrate the validity of estimated treatment regimes and examine operating characteristics under flexible modeling approaches. We also apply our methodology to evaluate optimally cost-effective treatment strategies for assigning adjuvant therapies to endometrial cancer patients. △ Less

Submitted 18 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: 16 pages, 4 tables, 1 figure

arXiv:2101.10466 [pdf, other]

A regression framework for a probabilistic measure of cost-effectiveness

Authors: Nicholas Illenberger, Nandita Mitra, Andrew J. Spieker

Abstract: To make informed health policy decisions regarding a treatment, we must consider both its cost and its clinical effectiveness. In past work, we introduced the net benefit separation (NBS) as a novel measure of cost-effectiveness. The NBS is a probabilistic measure that characterizes the extent to which a treated patient will be more likely to experience benefit as compared to an untreated patient.… ▽ More To make informed health policy decisions regarding a treatment, we must consider both its cost and its clinical effectiveness. In past work, we introduced the net benefit separation (NBS) as a novel measure of cost-effectiveness. The NBS is a probabilistic measure that characterizes the extent to which a treated patient will be more likely to experience benefit as compared to an untreated patient. Due to variation in treatment response across patients, uncovering factors that influence cost-effectiveness can assist policy makers in population-level decisions regarding resource allocation. In this paper, we introduce a regression framework for NBS in order to estimate covariate-specific NBS and find determinants of variation in NBS. Our approach is able to accommodate informative cost censoring through inverse probability weighting techniques, and addresses confounding through a semiparametric standardization procedure. Through simulations, we show that NBS regression performs well in a variety of common scenarios. We apply our proposed regression procedure to a realistic simulated data set as an illustration of how our approach could be used to investigate the association between cancer stage, comorbidities and cost-effectiveness when comparing adjuvant radiation therapy and chemotherapy in post-hysterectomy endometrial cancer patients. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2009.10839 [pdf, other]

doi 10.1515/ijb-2022-0051

Hierarchical Bayesian Bootstrap for Heterogeneous Treatment Effect Estimation

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. St… ▽ More A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) - average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum - which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin's Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types. △ Less

Submitted 4 January, 2023; v1 submitted 22 September, 2020; originally announced September 2020.

Journal ref: The International Journal of Biostatistics, 2022

arXiv:2009.00785 [pdf, ps, other]

Analysis of survival data with non-proportional hazards: A comparison of propensity score weighted methods

Authors: Elizabeth A. Handorf, Marc Smaldone, Sujana Movva, Nandita Mitra

Abstract: One of the most common ways researchers compare survival outcomes across treatments when confounding is present is using Cox regression. This model is limited by its underlying assumption of proportional hazards; in some cases, substantial violations may occur. Here we present and compare approaches which attempt to address this issue, including Cox models with time-varying hazard ratios; parametr… ▽ More One of the most common ways researchers compare survival outcomes across treatments when confounding is present is using Cox regression. This model is limited by its underlying assumption of proportional hazards; in some cases, substantial violations may occur. Here we present and compare approaches which attempt to address this issue, including Cox models with time-varying hazard ratios; parametric accelerated failure time models; Kaplan-Meier curves; and pseudo-observations. To adjust for differences between treatment groups, we use Inverse Probability of Treatment Weighting based on the propensity score. We examine clinically meaningful outcome measures that can be computed and directly compared across each method, namely, survival probability at time T, median survival, and restricted mean survival. We conduct simulation studies under a range of scenarios, and determine the biases, coverages, and standard errors of the Average Treatment Effects for each method. We then apply these approaches to two published observational studies of survival after cancer treatment. The first examines chemotherapy in sarcoma, where survival is very similar initially, but after two years the chemotherapy group shows a benefit. The other study is a comparison of surgical techniques for kidney cancer, where survival differences are attenuated over time. △ Less

Submitted 29 January, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

arXiv:2007.12973 [pdf, ps, other]

Doubly Robust Nonparametric Instrumental Variable Estimators for Survival Outcomes

Authors: Youjin Lee, Edward H. Kennedy, Nandita Mitra

Abstract: Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignora… ▽ More Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignorable censoring. We provide an efficient influence function-based estimator and a simple estimation procedure when the IV is either binary or continuous. The proposed estimators possess double-robustness properties and can easily incorporate nonparametric estimation using machine learning tools. In simulation studies, we demonstrate the flexibility and efficiency of our proposed estimators under various plausible scenarios. We apply our method to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for estimating the causal effect of screening on survival probabilities and investigate the causal contrasts between the two interventions under different censoring assumptions. △ Less

Submitted 28 September, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

arXiv:2002.04706 [pdf, other]

Bayesian Nonparametric Cost-Effectiveness Analyses: Causal Estimation and Adaptive Subgroup Discovery

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which are difficult to capture parametrically. Effectiv… ▽ More Cost-effectiveness analyses (CEAs) are at the center of health economic decision making. While these analyses help policy analysts and economists determine coverage, inform policy, and guide resource allocation, they are statistically challenging for several reasons. Cost and effectiveness are correlated and follow complex joint distributions which are difficult to capture parametrically. Effectiveness (often measured as increased survival time) and accumulated cost tends to be right-censored in many applications. Moreover, CEAs are often conducted using observational data with non-random treatment assignment. Policy-relevant causal estimation therefore requires robust confounding control. Finally, current CEA methods do not address cost-effectiveness heterogeneity in a principled way - often presenting population-averaged estimates even though significant effect heterogeneity may exist. Motivated by these challenges, we develop a nonparametric Bayesian model for joint cost-survival distributions in the presence of censoring. Our approach utilizes a joint Enriched Dirichlet Process prior on the covariate effects of cost and survival time, while using a Gamma Process prior on the baseline survival time hazard. Causal CEA estimands, with policy-relevant interpretations, are identified and estimated via a Bayesian nonparametric g-computation procedure. Finally, we outline how the induced clustering of the Enriched Dirichlet Process can be used to adaptively detect presence of subgroups with different cost-effectiveness profiles. We outline an MCMC procedure for full posterior inference and evaluate frequentist properties via simulations. We use our model to assess the cost-efficacy of chemotherapy versus radiation adjuvant therapy for treating endometrial cancer in the SEER-Medicare database. △ Less

Submitted 8 September, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

arXiv:1912.00039 [pdf, other]

Net benefit separation and the determination curve: a probabilistic framework for cost-effectiveness estimation

Authors: Andrew J. Spieker, Nicholas Illenberger, Jason A. Roy, Nandita Mitra

Abstract: Considerations regarding clinical effectiveness and cost are essential in comparing the overall value of two treatments. There has been growing interest in methodology to integrate cost and effectiveness measures in order to inform policy and promote adequate resource allocation. The net monetary benefit aggregates information on differences in mean cost and clinical outcomes; the cost-effectivene… ▽ More Considerations regarding clinical effectiveness and cost are essential in comparing the overall value of two treatments. There has been growing interest in methodology to integrate cost and effectiveness measures in order to inform policy and promote adequate resource allocation. The net monetary benefit aggregates information on differences in mean cost and clinical outcomes; the cost-effectiveness acceptability curve was then developed to characterize the extent to which the strength of evidence regarding net monetary benefit changes with fluctuations in the willingness-to-pay threshold. Methods to derive insights from characteristics of the cost/clinical outcomes besides mean differences remain undeveloped but may also be informative. We propose a novel probabilistic measure of cost-effectiveness based on the stochastic ordering of the individual net benefit distribution under each treatment. Our approach is able to accommodate features frequently encountered in observational data including confounding and censoring, and complements the net monetary benefit in the insights it provides. We conduct a range of simulations to evaluate finite-sample performance and illustrate our proposed approach using simulated data based on a study of endometrial cancer patients. △ Less

Submitted 2 December, 2019; v1 submitted 29 November, 2019; originally announced December 2019.

Comments: 10 pages; 5 figures; 3 tables

arXiv:1810.09494 [pdf, other]

doi 10.1111/biom.13244

A Bayesian Nonparametric Model for Zero-Inflated Outcomes: Prediction, Clustering, and Causal Estimation

Authors: Arman Oganisian, Nandita Mitra, Jason Roy

Abstract: Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for co… ▽ More Researchers are often interested in predicting outcomes, conducting clustering analysis to detect distinct subgroups of their data, or computing causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a fully nonparametric Bayesian generative model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest - allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. Lastly, we use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy and radiation therapy in the SEER medicare database. △ Less

Submitted 9 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

arXiv:1705.08742 [pdf, other]

A causal approach to analysis of censored medical costs in the presence of time-varying treatment

Authors: Andrew J. Spieker, Arman Oganisian, Emily M. Ko, Jason A. Roy, Nandita Mitra

Abstract: There has recently been a growing interest in the development of statistical methods to compare medical costs between treatment groups. When cumulative cost is the outcome of interest, right-censoring poses the challenge of informative missingness due to heterogeneity in the rates of cost accumulation across subjects. Existing approaches seeking to address the challenge of informative cost traject… ▽ More There has recently been a growing interest in the development of statistical methods to compare medical costs between treatment groups. When cumulative cost is the outcome of interest, right-censoring poses the challenge of informative missingness due to heterogeneity in the rates of cost accumulation across subjects. Existing approaches seeking to address the challenge of informative cost trajectories typically rely on inverse probability weighting and target a net "intent-to-treat" effect. However, no approaches capable of handling time-dependent treatment and confounding in this setting have been developed to date. A method to estimate the joint causal effect of a treatment regime on cost would be of value to inform public policy when comparing interventions. In this paper, we develop a nested g-computation approach to cost analysis in order to accommodate time-dependent treatment and repeated outcome measures. We demonstrate that our procedure is reasonably robust to departures from its distributional assumptions and can provide unique insights into fundamental differences in average cost across time-dependent treatment regimes. △ Less

Submitted 24 May, 2017; originally announced May 2017.

arXiv:1608.02273 [pdf, other]

Estimating scaled treatment effects with multiple outcomes

Authors: Edward H. Kennedy, Shreya Kangovi, Nandita Mitra

Abstract: In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less importan… ▽ More In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less important to different patients. In this paper we propose scaled effect measures (via potential outcome notation) that translate effects on multiple outcomes to a common scale, using mean-variance and median-interquartile-range -based standardizations. We present efficient, nonparametric, doubly robust methods for estimating these scaled effects (and weighted average summary measures), and for testing the null hypothesis that treatment affects all outcomes equally. We also discuss methods for exploring how treatment effects depend on covariates (i.e., effect modification). In addition to describing efficiency theory for our estimands and the asymptotic behavior of our estimators, we illustrate the methods in a simulation study and a data analysis. Importantly, and in contrast to much of the literature concerning effects on multiple outcomes, our methods are nonparametric and can be used not only in randomized trials to yield increased efficiency, but also in observational studies with high-dimensional covariates to reduce confounding bias. △ Less

Submitted 13 June, 2017; v1 submitted 7 August, 2016; originally announced August 2016.

arXiv:1401.1683 [pdf, ps, other]

doi 10.1214/13-AOAS665

Evaluating costs with unmeasured confounding: A sensitivity analysis for the treatment effect

Authors: Elizabeth A. Handorf, Justin E. Bekelman, Daniel F. Heitjan, Nandita Mitra

Abstract: Estimates of the effects of treatment on cost from observational studies are subject to bias if there are unmeasured confounders. It is therefore advisable in practice to assess the potential magnitude of such biases. We derive a general adjustment formula for loglinear models of mean cost and explore special cases under plausible assumptions about the distribution of the unmeasured confounder. We… ▽ More Estimates of the effects of treatment on cost from observational studies are subject to bias if there are unmeasured confounders. It is therefore advisable in practice to assess the potential magnitude of such biases. We derive a general adjustment formula for loglinear models of mean cost and explore special cases under plausible assumptions about the distribution of the unmeasured confounder. We assess the performance of the adjustment by simulation, in particular, examining robustness to a key assumption of conditional independence between the unmeasured and measured covariates given the treatment indicator. We apply our method to SEER-Medicare cost data for a stage II/III muscle-invasive bladder cancer cohort. We evaluate the costs for radical cystectomy vs. combined radiation/chemotherapy, and find that the significance of the treatment effect is sensitive to plausible unmeasured Bernoulli, Poisson and Gamma confounders. △ Less

Submitted 8 January, 2014; originally announced January 2014.

Comments: Published in at http://dx.doi.org/10.1214/13-AOAS665 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS665

Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 4, 2062-2080

Showing 1–23 of 23 results for author: Mitra, N