Search | arXiv e-print repository

Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles

Authors: Aldo Gael Carranza, Sanath Kumar Krishnamurthy, Susan Athey

Abstract: Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on ora… ▽ More Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action-independent redundancies that are not relevant for decision-making. We show it is more data-efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this observation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose heterogeneous treatment effect estimation, and we design a simple and computationally efficient algorithm based on this reduction. Our theoretical and experimental results demonstrate that heterogeneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model estimation and greater flexibility to model misspecification. △ Less

Submitted 24 February, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2010.13013 [pdf, other]

Tractable contextual bandits beyond realizability

Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

Abstract: Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does no… ▽ More Tractable contextual bandit algorithms often rely on the realizability assumption - i.e., that the true expected reward model belongs to a known class, such as linear functions. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizability-based algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times a function of the mean squared error between the best model in the class and the true model, where T is the total number of time-steps. Our work sheds light on the bias-variance trade-off for tractable contextual bandits. This trade-off is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias. △ Less

Submitted 25 February, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: 35 pages, 6 figures

arXiv:1808.05293 [pdf, ps, other]

Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption

Authors: Susan Athey, Guido Imbens

Abstract: In this paper we study estimation of and inference for average treatment effects in a setting with panel data. We focus on the setting where units, e.g., individuals, firms, or states, adopt the policy or treatment of interest at a particular point in time, and then remain exposed to this treatment at all times afterwards. We take a design perspective where we investigate the properties of estimat… ▽ More In this paper we study estimation of and inference for average treatment effects in a setting with panel data. We focus on the setting where units, e.g., individuals, firms, or states, adopt the policy or treatment of interest at a particular point in time, and then remain exposed to this treatment at all times afterwards. We take a design perspective where we investigate the properties of estimators and procedures given assumptions on the assignment process. We show that under random assignment of the adoption date the standard Difference-In-Differences estimator is is an unbiased estimator of a particular weighted average causal effect. We characterize the proeperties of this estimand, and show that the standard variance estimator is conservative. △ Less

Submitted 1 September, 2018; v1 submitted 15 August, 2018; originally announced August 2018.

arXiv:1807.11408 [pdf, other]

Local Linear Forests

Authors: Rina Friedberg, Julie Tibshirani, Susan Athey, Stefan Wager

Abstract: Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure… ▽ More Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals, and can show poor predictive performance in the presence of strong, smooth effects. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence for random forests with smooth signals, and provides substantial gains in accuracy on both real and simulated data. We prove a central limit theorem valid under regularity conditions on the forest and smoothness constraints, and propose a computationally efficient construction for confidence intervals. Moving to a causal inference application, we discuss the merits of local regression adjustments for heterogeneous treatment effect estimation, and give an example on a dataset exploring the effect word choice has on attitudes to the social safety net. Last, we include simulation results on real and generated data. △ Less

Submitted 4 September, 2020; v1 submitted 30 July, 2018; originally announced July 2018.

Comments: Forthcoming in the Journal of Computational and Graphical Statistics

arXiv:1710.10251 [pdf, other]

doi 10.1080/01621459.2021.1891924

Matrix Completion Methods for Causal Panel Data Models

Authors: Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, Khashayar Khosravi

Abstract: In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreat… ▽ More In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the "missing" elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data. △ Less

Submitted 21 April, 2022; v1 submitted 27 October, 2017; originally announced October 2017.

Comments: 42 pages, 10 figures

Journal ref: Journal of American Statistical Association (JASA), Journal of American Statistical Association (JASA), Vol 116 (536), 2021

arXiv:1710.02926 [pdf, ps, other]

When Should You Adjust Standard Errors for Clustering?

Authors: Alberto Abadie, Susan Athey, Guido Imbens, Jeffrey Wooldridge

Abstract: In empirical work it is common to estimate parameters of models and report associated standard errors that account for "clustering" of units, where clusters are defined by factors such as geography. Clustering adjustments are typically motivated by the concern that unobserved components of outcomes for units within clusters are correlated. However, this motivation does not provide guidance about q… ▽ More In empirical work it is common to estimate parameters of models and report associated standard errors that account for "clustering" of units, where clusters are defined by factors such as geography. Clustering adjustments are typically motivated by the concern that unobserved components of outcomes for units within clusters are correlated. However, this motivation does not provide guidance about questions such as: (i) Why should we adjust standard errors for clustering in some situations but not others? How can we justify the common practice of clustering in observational studies but not randomized experiments, or clustering by state but not by gender? (ii) Why is conventional clustering a potentially conservative "all-or-nothing" adjustment, and are there alternative methods that respond to data and are less conservative? (iii) In what settings does the choice of whether and how to cluster make a difference? We address these questions using a framework of sampling and design inference. We argue that clustering can be needed to address sampling issues if sampling follows a two stage process where in the first stage, a subset of clusters are sampled from a population of clusters, and in the second stage, units are sampled from the sampled clusters. Then, clustered standard errors account for the existence of clusters in the population that we do not see in the sample. Clustering can be needed to account for design issues if treatment assignment is correlated with membership in a cluster. We propose new variance estimators to deal with intermediate settings where conventional cluster standard errors are unnecessarily conservative and robust standard errors are too small. △ Less

Submitted 19 September, 2022; v1 submitted 8 October, 2017; originally announced October 2017.

arXiv:1706.01778 [pdf, ps, other]

Sampling-based vs. Design-based Uncertainty in Regression Analysis

Authors: Alberto Abadie, Susan Athey, Guido W. Imbens, Jeffrey M. Wooldridge

Abstract: Consider a researcher estimating the parameters of a regression function based on data for all 50 states in the United States or on data for all visits to a website. What is the interpretation of the estimated parameters and the standard errors? In practice, researchers typically assume that the sample is randomly drawn from a large population of interest and report standard errors that are design… ▽ More Consider a researcher estimating the parameters of a regression function based on data for all 50 states in the United States or on data for all visits to a website. What is the interpretation of the estimated parameters and the standard errors? In practice, researchers typically assume that the sample is randomly drawn from a large population of interest and report standard errors that are designed to capture sampling variation. This is common even in applications where it is difficult to articulate what that population of interest is, and how it differs from the sample. In this article, we explore an alternative approach to inference, which is partly design-based. In a design-based setting, the values of some of the regressors can be manipulated, perhaps through a policy intervention. Design-based uncertainty emanates from lack of knowledge about the values that the regression outcome would have taken under alternative interventions. We derive standard errors that account for design-based uncertainty instead of, or in addition to, sampling-based uncertainty. We show that our standard errors in general are smaller than the usual infinite-population sampling-based standard errors and provide conditions under which they coincide. △ Less

Submitted 21 June, 2019; v1 submitted 6 June, 2017; originally announced June 2017.

arXiv:1702.02896 [pdf, other]

Policy Learning with Observational Data

Authors: Susan Athey, Stefan Wager

Abstract: In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to… ▽ More In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application-specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy. △ Less

Submitted 4 September, 2020; v1 submitted 9 February, 2017; originally announced February 2017.

Comments: Forthcoming in Econometrica. Original title: Efficient Policy Learning

arXiv:1604.07125 [pdf, other]

Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions

Authors: Susan Athey, Guido W. Imbens, Stefan Wager

Abstract: There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pre-treatment variables. The unconfoundedness assumption is often more plausible if a large number of pre-treatment variables are included in the analysis, but th… ▽ More There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pre-treatment variables. The unconfoundedness assumption is often more plausible if a large number of pre-treatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. In this paper, we develop a method for de-biasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for sqrt{n}-consistent inference of average treatment effects in high-dimensional linear models. Given linearity, we do not need to assume that the treatment propensities are estimable, or that the average treatment effect is a sparse contrast of the outcome model parameters. Rather, in addition standard assumptions used to make lasso regression on the outcome model consistent under 1-norm error, we only require overlap, i.e., that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment. △ Less

Submitted 31 January, 2018; v1 submitted 25 April, 2016; originally announced April 2016.

Comments: Forthcoming in the Journal of the Royal Statistical Society, Series B

arXiv:1510.04342 [pdf, other]

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Authors: Stefan Wager, Susan Athey

Abstract: Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfounde… ▽ More Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates. △ Less

Submitted 9 July, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

Comments: To appear in the Journal of the American Statistical Association. Part of the results developed in this paper were made available as an earlier technical report "Asymptotic Theory for Random Forests", available at (arXiv:1405.0352)

arXiv:1506.02084 [pdf, ps, other]

Exact P-values for Network Interference

Authors: Susan Athey, Dean Eckles, Guido Imbens

Abstract: We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatm… ▽ More We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit's treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment. △ Less

Submitted 5 June, 2015; originally announced June 2015.

Comments: 40 pages

Showing 1–11 of 11 results for author: Athey, S