-
Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect
Authors:
Ruiyang Lin,
Yongyi Guo,
Kyra Gan
Abstract:
The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work e…
▽ More
The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work establishes three fundamental advances for WCDE in observational studies: First, we establish necessary and sufficient conditions for the unique identifiability of the WCDE, clarifying when it diverges from the CDE. Next, we consider nonparametric estimation of the WCDE and derive its influence function, focusing on the class of regular and asymptotically linear estimators. Lastly, we characterize the optimal covariate adjustment set that minimizes the asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect estimation. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.
△ Less
Submitted 22 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Authors:
Brian Cho,
Dominik Meier,
Kyra Gan,
Nathan Kallus
Abstract:
In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickl…
▽ More
In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible. We show that GAI can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel nonparametric anytime-valid sequential test for labeling arm means. We first establish that our sequential test maintains error control under highly nonparametric assumptions and asymptotically achieves the minimax optimal e-power, a notion of power for anytime-valid tests. Next, by pairing regret-minimizing sampling schemes with our sequential test, we provide an approach that achieves minimax optimal stopping times for labeling arms with means above a threshold, under an error probability constraint. Our empirical results validate our approach beyond the minimax setting, reducing the expected number of samples for all stopping times by at least 50% across both synthetic and real-world settings.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Authors:
Brian M Cho,
Ana-Roxana Pop,
Kyra Gan,
Sam Corbett-Davies,
Israel Nir,
Ariel Evnine,
Nathan Kallus
Abstract:
When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on…
▽ More
When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on threshold policies, a ubiquitous class of policies with applications in economics, healthcare, and digital advertising. Existing methods rely on potentially underpowered safety checks and limit the opportunities for finding safe improvements, so too often they must revert to the baseline to maintain safety. We overcome these issues by leveraging the most powerful safety test in the asymptotic regime and allowing for multiple candidates to be tested for improvement over the baseline. We show that in adversarial settings, our approach controls the rate of adopting a policy worse than the baseline to the pre-specified error level, even in moderate sample sizes. We present CSPI and CSPI-MT, two novel heuristics for selecting cutoff(s) to maximize the policy improvement from baseline. We demonstrate through both synthetic and external datasets that our approaches improve both the detection rates of safe policies and the realized improvement, particularly under stringent safety requirements and low signal-to-noise conditions.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Local Causal Discovery for Structural Evidence of Direct Discrimination
Authors:
Jacqueline Maasch,
Kyra Gan,
Violet Chen,
Agni Orfanoudaki,
Nil-Jana Akpinar,
Fei Wang
Abstract:
Identifying the causal pathways of unfairness is a critical objective for improving policy design and algorithmic decision-making. Prior work in causal fairness analysis often requires knowledge of the causal graph, hindering practical applications in complex or low-knowledge domains. Moreover, global discovery methods that learn causal structure from data can display unstable performance on finit…
▽ More
Identifying the causal pathways of unfairness is a critical objective for improving policy design and algorithmic decision-making. Prior work in causal fairness analysis often requires knowledge of the causal graph, hindering practical applications in complex or low-knowledge domains. Moreover, global discovery methods that learn causal structure from data can display unstable performance on finite samples, preventing robust fairness conclusions. To mitigate these challenges, we introduce local discovery for direct discrimination (LD3): a method that uncovers structural evidence of direct unfairness by identifying the causal parents of an outcome variable. LD3 performs a linear number of conditional independence tests relative to variable set size, and allows for latent confounding under the sufficient condition that all parents of the outcome are observed. We show that LD3 returns a valid adjustment set (VAS) under a new graphical criterion for the weighted controlled direct effect, a qualitative indicator of direct discrimination. LD3 limits unnecessary adjustment, providing interpretable VAS for assessing unfairness. We use LD3 to analyze causal fairness in two complex decision systems: criminal recidivism prediction and liver transplant allocation. LD3 was more time-efficient and returned more plausible results on real-world data than baselines, which took 46$\times$ to 5870$\times$ longer to execute.
△ Less
Submitted 19 December, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
Authors:
Brian Cho,
Kyra Gan,
Nathan Kallus
Abstract:
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoreti…
▽ More
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoretical guarantees on type-I error control, power, and asymptotic growth rate/$e$-power in the setting of a single data stream; (2) we introduce PEAK, a generalization of this betting scheme to multiple streams, that (i) avoids using wasteful union bounds via averaging, (ii) is a test of power one under mild regularity conditions on the sampling scheme of the streams, and (iii) reduces computational overhead when applying the testing-as-betting approaches for pure-exploration bandit problems. We illustrate the practical benefits of PEAK using both synthetic and real-world HeartSteps datasets. Our experiments show that PEAK provides up to an 85\% reduction in the number of samples before stopping compared to existing stopping rules for pure-exploration bandit problems, and matches the performance of state-of-the-art sequential tests while improving upon computational complexity.
△ Less
Submitted 2 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs
Authors:
Jacqueline Maasch,
Weishen Pan,
Shantanu Gupta,
Volodymyr Kuleshov,
Kyra Gan,
Fei Wang
Abstract:
Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causa…
▽ More
Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causal discovery method that is tailored for downstream inference tasks without requiring parametric and pretreatment assumptions. LDP is a constraint-based procedure that returns a VAS for an exposure-outcome pair under latent confounding, given sufficient conditions. The total number of independence tests performed is worst-case quadratic with respect to the cardinality of the variable set. Asymptotic theoretical guarantees are numerically validated on synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baseline discovery algorithms, with LDP outperforming on confounder recall, runtime, and test count for VAS discovery. Notably, LDP ran at least 1300x faster than baselines on a benchmark.
△ Less
Submitted 1 June, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Anytime-valid inference in N-of-1 trials
Authors:
Ivana Malenica,
Yongyi Guo,
Kyra Gan,
Stefan Konigorski
Abstract:
App-based N-of-1 trials offer a scalable experimental design for assessing the effects of health interventions at an individual level. Their practical success depends on the strong motivation of participants, which, in turn, translates into high adherence and reduced loss to follow-up. One way to maintain participant engagement is by sharing their interim results. Continuously testing hypotheses d…
▽ More
App-based N-of-1 trials offer a scalable experimental design for assessing the effects of health interventions at an individual level. Their practical success depends on the strong motivation of participants, which, in turn, translates into high adherence and reduced loss to follow-up. One way to maintain participant engagement is by sharing their interim results. Continuously testing hypotheses during a trial, known as "peeking", can also lead to shorter, lower-risk trials by detecting strong effects early. Nevertheless, traditionally, results are only presented upon the trial's conclusion. In this work, we introduce a potential outcomes framework that permits interim peeking of the results and enables statistically valid inferences to be drawn at any point during N-of-1 trials. Our work builds on the growing literature on valid confidence sequences, which enables anytime-valid inference with uniform type-1 error guarantees over time. We propose several causal estimands for treatment effects applicable in an N-of-1 trial and demonstrate, through empirical evaluation, that the proposed approach results in valid confidence sequences over time. We anticipate that incorporating anytime-valid inference into clinical trials can significantly enhance trial participation and empower participants.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters
Authors:
Brian Cho,
Yaroslav Mukhin,
Kyra Gan,
Ivana Malenica
Abstract:
When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce ``plug-in bias.'' Traditional methods addressing this suboptimal bias-variance trade-off rely on the \emph{influence function} (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the…
▽ More
When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce ``plug-in bias.'' Traditional methods addressing this suboptimal bias-variance trade-off rely on the \emph{influence function} (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the nuisance parameter multiple times using the corresponding IFs, which poses analytical and computational challenges. In this work, we leverage the \emph{targeted maximum likelihood estimation} (TMLE) framework to propose a novel method named \emph{kernel debiased plug-in estimation} (KDPE). KDPE refines an initial estimate through regularized likelihood maximization steps, employing a nonparametric model based on \emph{reproducing kernel Hilbert spaces}. We show that KDPE: (i) simultaneously debiases \emph{all} pathwise differentiable target parameters that satisfy our regularity conditions, (ii) does not require the IF for implementation, and (iii) remains computationally tractable. We numerically illustrate the use of KDPE and validate our theoretical results.
△ Less
Submitted 2 June, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Greedy Approximation Algorithms for Active Sequential Hypothesis Testing
Authors:
Kyra Gan,
Su Jia,
Andrew Li
Abstract:
In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $δ>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypo…
▽ More
In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $δ>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least $1 - δ$. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.
△ Less
Submitted 6 October, 2021; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Causal Inference With Selectively Deconfounded Data
Authors:
Kyra Gan,
Andrew A. Li,
Zachary C. Lipton,
Sridhar Tayur
Abstract:
Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data;(b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporati…
▽ More
Given only data generated by a standard confounding graph with unobserved confounder, the Average Treatment Effect (ATE) is not identifiable. To estimate the ATE, a practitioner must then either (a) collect deconfounded data;(b) run a clinical trial; or (c) elucidate further properties of the causal graph that might render the ATE identifiable. In this paper, we consider the benefit of incorporating a large confounded observational dataset (confounder unobserved) alongside a small deconfounded observational dataset (confounder revealed) when estimating the ATE. Our theoretical results suggest that the inclusion of confounded data can significantly reduce the quantity of deconfounded data required to estimate the ATE to within a desired accuracy level. Moreover, in some cases -- say, genetics -- we could imagine retrospectively selecting samples to deconfound. We demonstrate that by actively selecting these samples based upon the (already observed) treatment and outcome, we can reduce sample complexity further. Our theoretical and empirical results establish that the worst-case relative performance of our approach (vs. a natural benchmark) is bounded while our best-case gains are unbounded. Finally, we demonstrate the benefits of selective deconfounding using a large real-world dataset related to genetic mutation in cancer.
△ Less
Submitted 6 March, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.