-
Learning Conditional Average Treatment Effects in Regression Discontinuity Designs using Bayesian Additive Regression Trees
Authors:
Rafael Alcantara,
P. Richard Hahn,
Carlos Carvalho,
Hedibert Lopes
Abstract:
BART (Bayesian additive regression trees) has been established as a leading supervised learning method, particularly in the field of causal inference. This paper explores the use of BART models for learning conditional average treatment effects (CATE) from regression discontinuity designs, where treatment assignment is based on whether an observed covariate (called the running variable) exceeds a…
▽ More
BART (Bayesian additive regression trees) has been established as a leading supervised learning method, particularly in the field of causal inference. This paper explores the use of BART models for learning conditional average treatment effects (CATE) from regression discontinuity designs, where treatment assignment is based on whether an observed covariate (called the running variable) exceeds a pre-specified threshold. A purpose-built version of BART that uses linear regression leaf models (of the running variable and treatment assignment dummy) is shown to out-perform off-the-shelf BART implementations as well as a local polynomial regression approach and a CART-based approach. The new method is evaluated in thorough simulation studies as well as an empirical application looking at the effect of academic probation on student performance.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Modified BART for Learning Heterogeneous Effects in Regression Discontinuity Designs
Authors:
Rafael Alcantara,
Meijia Wang,
P. Richard Hahn,
Hedibert Lopes
Abstract:
This paper introduces BART-RDD, a sum-of-trees regression model built around a novel regression tree prior, which incorporates the special covariate structure of regression discontinuity designs. Specifically, the tree splitting process is constrained to ensure overlap within a narrow band surrounding the running variable cutoff value, where the treatment effect is identified. It is shown that unm…
▽ More
This paper introduces BART-RDD, a sum-of-trees regression model built around a novel regression tree prior, which incorporates the special covariate structure of regression discontinuity designs. Specifically, the tree splitting process is constrained to ensure overlap within a narrow band surrounding the running variable cutoff value, where the treatment effect is identified. It is shown that unmodified BART-based models estimate RDD treatment effects poorly, while our modified model accurately recovers treatment effects at the cutoff. Specifically, BART-RDD is perhaps the first RDD method that effectively learns conditional average treatment effects. The new method is investigated in thorough simulation studies as well as an empirical application looking at the effect of academic probation on student performance in subsequent terms (Lindo et al., 2010).
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
LongBet: Heterogeneous Treatment Effect Estimation in Panel Data
Authors:
Meijia Wang,
Ignacio Martinez,
P. Richard Hahn
Abstract:
This paper introduces a novel approach for estimating heterogeneous treatment effects of binary treatment in panel data, particularly focusing on short panel data with large cross-sectional data and observed confoundings. In contrast to traditional literature in difference-in-differences method that often relies on the parallel trend assumption, our proposed model does not necessitate such an assu…
▽ More
This paper introduces a novel approach for estimating heterogeneous treatment effects of binary treatment in panel data, particularly focusing on short panel data with large cross-sectional data and observed confoundings. In contrast to traditional literature in difference-in-differences method that often relies on the parallel trend assumption, our proposed model does not necessitate such an assumption. Instead, it leverages observed confoundings to impute potential outcomes and identify treatment effects. The method presented is a Bayesian semi-parametric approach based on the Bayesian causal forest model, which is extended here to suit panel data settings. The approach offers the advantage of the Bayesian approach to provides uncertainty quantification on the estimates. Simulation studies demonstrate its performance with and without the presence of parallel trend. Additionally, our proposed model enables the estimation of conditional average treatment effects, a capability that is rarely available in panel data settings.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation
Authors:
Demetrios Papakostas,
Andrew Herren,
P. Richard Hahn,
Francisco Castillo
Abstract:
Causal inference has gained much popularity in recent years, with interests ranging from academic, to industrial, to educational, and all in between. Concurrently, the study and usage of neural networks has also grown profoundly (albeit at a far faster rate). What we aim to do in this blog write-up is demonstrate a Neural Network causal inference architecture. We develop a fully connected neural n…
▽ More
Causal inference has gained much popularity in recent years, with interests ranging from academic, to industrial, to educational, and all in between. Concurrently, the study and usage of neural networks has also grown profoundly (albeit at a far faster rate). What we aim to do in this blog write-up is demonstrate a Neural Network causal inference architecture. We develop a fully connected neural network implementation of the popular Bayesian Causal Forest algorithm, a state of the art tree based method for estimating heterogeneous treatment effects. We compare our implementation to existing neural network causal inference methodologies, showing improvements in performance in simulation settings. We apply our method to a dataset examining the effect of stress on sleep.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
On true versus estimated propensity scores for treatment effect estimation with discrete controls
Authors:
Andrew Herren,
P. Richard Hahn
Abstract:
The finite sample variance of an inverse propensity weighted estimator is derived in the case of discrete control variables with finite support. The obtained expressions generally corroborate widely-cited asymptotic theory showing that estimated propensity scores are superior to true propensity scores in the context of inverse propensity weighting. However, similar analysis of a modified estimator…
▽ More
The finite sample variance of an inverse propensity weighted estimator is derived in the case of discrete control variables with finite support. The obtained expressions generally corroborate widely-cited asymptotic theory showing that estimated propensity scores are superior to true propensity scores in the context of inverse propensity weighting. However, similar analysis of a modified estimator demonstrates that foreknowledge of the true propensity function can confer a statistical advantage when estimating average treatment effects.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Feature selection in stratification estimators of causal effects: lessons from potential outcomes, causal diagrams, and structural equations
Authors:
P. Richard Hahn,
Andrew Herren
Abstract:
What is the ideal regression (if any) for estimating average causal effects? We study this question in the setting of discrete covariates, deriving expressions for the finite-sample variance of various stratification estimators. This approach clarifies the fundamental statistical phenomena underlying many widely-cited results. Our exposition combines insights from three distinct methodological tra…
▽ More
What is the ideal regression (if any) for estimating average causal effects? We study this question in the setting of discrete covariates, deriving expressions for the finite-sample variance of various stratification estimators. This approach clarifies the fundamental statistical phenomena underlying many widely-cited results. Our exposition combines insights from three distinct methodological traditions for studying causal effect estimation: potential outcomes, causal diagrams, and structural models with additive errors.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Stochastic Tree Ensembles for Estimating Heterogeneous Effects
Authors:
Nikolay Krantsevich,
Jingyu He,
P. Richard Hahn
Abstract:
Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper d…
▽ More
Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper develops a novel algorithm for fitting the BCF model, which is more efficient than the previously available Gibbs sampler. The new algorithm can be used to initialize independent chains of the existing Gibbs sampler leading to better posterior exploration and coverage of the associated interval estimates in simulation studies. The new algorithm is compared to related approaches via simulation studies as well as an empirical analysis.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Statistical Aspects of SHAP: Functional ANOVA for Model Interpretation
Authors:
Andrew Herren,
P. Richard Hahn
Abstract:
SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and outline its connection to the functional ANOVA decomposition. We use this connection to show that challenges in SHAP approximations largely relate to the choice of a feature distribution and the number of $2^p$ ANOVA terms estimated. We argue…
▽ More
SHAP is a popular method for measuring variable importance in machine learning models. In this paper, we study the algorithm used to estimate SHAP scores and outline its connection to the functional ANOVA decomposition. We use this connection to show that challenges in SHAP approximations largely relate to the choice of a feature distribution and the number of $2^p$ ANOVA terms estimated. We argue that the connection between machine learning explainability and sensitivity analysis is illuminating in this case, but the immediate practical consequences are not obvious since the two fields face a different set of constraints. Machine learning explainability concerns models which are inexpensive to evaluate but often have hundreds, if not thousands, of features. Sensitivity analysis typically deals with models from physics or engineering which may be very time consuming to run, but operate on a comparatively small space of inputs.
△ Less
Submitted 11 November, 2022; v1 submitted 21 August, 2022;
originally announced August 2022.
-
Local Gaussian process extrapolation for BART models with applications to causal inference
Authors:
Meijiang Wang,
Jingyu He,
P. Richard Hahn
Abstract:
Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussi…
▽ More
Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussian processes to the leaf nodes in BART for predicting points outside the range of the observed data. The new method is compared to standard BART implementations and recent frequentist resampling-based methods for predictive inference. We apply the new approach to a challenging problem from causal inference, wherein for some regions of predictor space, only treated or untreated units are observed (but not both). In simulation studies, the new approach boasts superior performance compared to popular alternatives, such as Jackknife+.
△ Less
Submitted 24 February, 2023; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Bayesian decision theory for tree-based adaptive screening tests with an application to youth delinquency
Authors:
Chelsea Krantsevich,
P. Richard Hahn,
Yi Zheng,
Charles Katz
Abstract:
Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they should also be reasonably brief; adaptive screening tests are useful for this purpose. Adaptive tests constructed using classification and regre…
▽ More
Crime prevention strategies based on early intervention depend on accurate risk assessment instruments for identifying high risk youth. It is important in this context that the instruments be convenient to administer, which means, in particular, that they should also be reasonably brief; adaptive screening tests are useful for this purpose. Adaptive tests constructed using classification and regression trees are becoming a popular alternative to traditional Item Response Theory (IRT) approaches for adaptive testing. However, tree-based adaptive tests lack a principled criterion for terminating the test. This paper develops a Bayesian decision theory framework for measuring the trade-off between brevity and accuracy, when considering tree-based adaptive screening tests of different lengths. We also present a novel method for designing tree-based adaptive tests, motivated by this framework. The framework and associated adaptive test method are demonstrated through an application to youth delinquency risk assessment in Honduras; it is shown that an adaptive test requiring a subject to answer fewer than 10 questions can identify high risk youth nearly as accurately as an unabridged survey containing 173 items.
△ Less
Submitted 27 June, 2022; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Do forecasts of bankruptcy cause bankruptcy? A machine learning sensitivity analysis
Authors:
Demetrios Papakostas,
P. Richard Hahn,
Jared Murray,
Frank Zhou,
Joseph Gerakos
Abstract:
It is widely speculated that auditors' public forecasts of bankruptcy are, at least in part, self-fulfilling prophecies in the sense that they might actually cause bankruptcies that would not have otherwise occurred. This conjecture is hard to prove, however, because the strong association between bankruptcies and bankruptcy forecasts could simply indicate that auditors are skillful forecasters wi…
▽ More
It is widely speculated that auditors' public forecasts of bankruptcy are, at least in part, self-fulfilling prophecies in the sense that they might actually cause bankruptcies that would not have otherwise occurred. This conjecture is hard to prove, however, because the strong association between bankruptcies and bankruptcy forecasts could simply indicate that auditors are skillful forecasters with unique access to highly predictive covariates. In this paper, we investigate the causal effect of bankruptcy forecasts on bankruptcy using nonparametric sensitivity analysis. We contrast our analysis with two alternative approaches: a linear bivariate probit model with an endogenous regressor, and a recently developed bound on risk ratios called E-values. Additionally, our machine learning approach incorporates a monotonicity constraint corresponding to the assumption that bankruptcy forecasts do not make bankruptcies less likely. Finally, a tree-based posterior summary of the treatment effect estimates allows us to explore which observable firm characteristics moderate the inducement effect.
△ Less
Submitted 23 June, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Semi-supervised learning and the question of true versus estimated propensity scores
Authors:
Andrew Herren,
P. Richard Hahn
Abstract:
A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller la…
▽ More
A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as "unlabeled" if treatment assignment and covariates are observed but outcomes are unobserved. According to this formulation, large unlabeled data sets could be used to estimate a high dimensional propensity function and causal inference using a much smaller labeled data set could proceed via weighted estimators using the learned propensity scores. In the limiting case of infinite unlabeled data, one may estimate the high dimensional propensity function exactly. However, longstanding advice in the causal inference community suggests that estimated propensity scores (from labeled data alone) are actually preferable to true propensity scores, implying that the unlabeled data is actually useless in this context. In this paper we examine this paradox and propose a simple procedure that reconciles the strong intuition that a known propensity functions should be useful for estimating treatment effects with the previous literature suggesting otherwise. Further, simulation studies suggest that direct regression may be preferable to inverse-propensity weight estimators in many circumstances.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Estimating heterogeneous effects of continuous exposures using Bayesian tree ensembles: revisiting the impact of abortion rates on crime
Authors:
Spencer Woody,
Carlos M. Carvalho,
P. Richard Hahn,
Jared S. Murray
Abstract:
In estimating the causal effect of a continuous exposure or treatment, it is important to control for all confounding factors. However, most existing methods require parametric specification for how control variables influence the outcome or generalized propensity score, and inference on treatment effects is usually sensitive to this choice. Additionally, it is often the goal to estimate how the t…
▽ More
In estimating the causal effect of a continuous exposure or treatment, it is important to control for all confounding factors. However, most existing methods require parametric specification for how control variables influence the outcome or generalized propensity score, and inference on treatment effects is usually sensitive to this choice. Additionally, it is often the goal to estimate how the treatment effect varies across observed units. To address this gap, we propose a semiparametric model using Bayesian tree ensembles for estimating the causal effect of a continuous treatment of exposure which (i) does not require a priori parametric specification of the influence of control variables, and (ii) allows for identification of effect modification by pre-specified moderators. The main parametric assumption we make is that the effect of the exposure on the outcome is linear, with the steepness of this relationship determined by a nonparametric function of the moderators, and we provide heuristics to diagnose the validity of this assumption. We apply our methods to revisit a 2001 study of how abortion rates affect incidence of crime.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Stochastic tree ensembles for regularized nonlinear regression
Authors:
Jingyu He,
P. Richard Hahn
Abstract:
This paper develops a novel stochastic tree ensemble method for nonlinear regression, which we refer to as XBART, short for Accelerated Bayesian Additive Regression Trees. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning approaches, the new method attains state-of-the-art performance: in many s…
▽ More
This paper develops a novel stochastic tree ensemble method for nonlinear regression, which we refer to as XBART, short for Accelerated Bayesian Additive Regression Trees. By combining regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning approaches, the new method attains state-of-the-art performance: in many settings it is both faster and more accurate than the widely-used XGBoost algorithm. Via careful simulation studies, we demonstrate that our new approach provides accurate point-wise estimates of the mean function and does so faster than popular alternatives, such as BART, XGBoost and neural networks (using Keras). We also prove a number of basic theoretical results about the new algorithm, including consistency of the single tree version of the model and stationarity of the Markov chain produced by the ensemble version. Furthermore, we demonstrate that initializing standard Bayesian additive regression trees Markov chain Monte Carlo (MCMC) at XBART-fitted trees considerably improves credible interval coverage and reduces total run-time.
△ Less
Submitted 3 June, 2021; v1 submitted 9 February, 2020;
originally announced February 2020.
-
A Symmetric Prior for Multinomial Probit Models
Authors:
Lane F. Burgette,
David Puelz,
P. Richard Hahn
Abstract:
Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification strategy, and associated prior distribution for the model parameters, that renders the prior symmetric with respect to relabeling the outcome categories. The new prio…
▽ More
Fitted probabilities from widely used Bayesian multinomial probit models can depend strongly on the choice of a base category, which is used to uniquely identify the parameters of the model. This paper proposes a novel identification strategy, and associated prior distribution for the model parameters, that renders the prior symmetric with respect to relabeling the outcome categories. The new prior permits an efficient Gibbs algorithm that samples rank-deficient covariance matrices without resorting to Metropolis-Hastings updates.
△ Less
Submitted 17 May, 2020; v1 submitted 21 December, 2019;
originally announced December 2019.
-
An illustration of the risk of borrowing information via a shared likelihood
Authors:
P. Richard Hahn
Abstract:
A concrete, stylized example illustrates that inferences may be degraded, rather than improved, by incorporating supplementary data via a joint likelihood. In the example, the likelihood is assumed to be correctly specified, as is the prior over the parameter of interest; all that is necessary for the joint modeling approach to suffer is misspecification of the prior over a nuisance parameter.
A concrete, stylized example illustrates that inferences may be degraded, rather than improved, by incorporating supplementary data via a joint likelihood. In the example, the likelihood is assumed to be correctly specified, as is the prior over the parameter of interest; all that is necessary for the joint modeling approach to suffer is misspecification of the prior over a nuisance parameter.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017
Authors:
P. Richard Hahn,
Vincent Dorie,
Jared S. Murray
Abstract:
This brief note documents the data generating processes used in the 2017 Data Analysis Challenge associated with the Atlantic Causal Inference Conference (ACIC). The focus of the challenge was estimation and inference for conditional average treatment effects (CATEs) in the presence of targeted selection, which leads to strong confounding. The associated data files and further plots can be found o…
▽ More
This brief note documents the data generating processes used in the 2017 Data Analysis Challenge associated with the Atlantic Causal Inference Conference (ACIC). The focus of the challenge was estimation and inference for conditional average treatment effects (CATEs) in the presence of targeted selection, which leads to strong confounding. The associated data files and further plots can be found on the first author's web page.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
XBART: Accelerated Bayesian Additive Regression Trees
Authors:
Jingyu He,
Saar Yalov,
P. Richard Hahn
Abstract:
Bayesian additive regression trees (BART) (Chipman et. al., 2010) is a powerful predictive model that often outperforms alternative models at out-of-sample prediction. BART is especially well-suited to settings with unstructured predictor variables and substantial sources of unmeasured variation as is typical in the social, behavioral and health sciences. This paper develops a modified version of…
▽ More
Bayesian additive regression trees (BART) (Chipman et. al., 2010) is a powerful predictive model that often outperforms alternative models at out-of-sample prediction. BART is especially well-suited to settings with unstructured predictor variables and substantial sources of unmeasured variation as is typical in the social, behavioral and health sciences. This paper develops a modified version of BART that is amenable to fast posterior estimation. We present a stochastic hill climbing algorithm that matches the remarkable predictive accuracy of previous BART implementations, but is many times faster and less memory intensive. Simulation studies show that the new method is comparable in computation time and more accurate at function estimation than both random forests and gradient boosting.
△ Less
Submitted 14 March, 2019; v1 submitted 4 October, 2018;
originally announced October 2018.
-
A Survey of Learning Causality with Data: Problems and Methods
Authors:
Ruocheng Guo,
Lu Cheng,
Jundong Li,
P. Richard Hahn,
Huan Liu
Abstract:
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and r…
▽ More
This work considers the question of how convenient access to copious data impacts our ability to learn causal effects and relations. In what ways is learning causality in the era of big data different from -- or the same as -- the traditional one? To answer this question, this survey provides a comprehensive and structured review of both traditional and frontier methods in learning causality and relations along with the connections between causality and machine learning. This work points out on a case-by-case basis how big data facilitates, complicates, or motivates each approach.
△ Less
Submitted 5 May, 2020; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Efficient sampling for Gaussian linear regression with arbitrary priors
Authors:
P. Richard Hahn,
Jingyu He,
Hedibert Lopes
Abstract:
This paper develops a slice sampler for Bayesian linear regression models with arbitrary priors. The new sampler has two advantages over current approaches. One, it is faster than many custom implementations that rely on auxiliary latent variables, if the number of regressors is large. Two, it can be used with any prior with a density function that can be evaluated up to a normalizing constant, ma…
▽ More
This paper develops a slice sampler for Bayesian linear regression models with arbitrary priors. The new sampler has two advantages over current approaches. One, it is faster than many custom implementations that rely on auxiliary latent variables, if the number of regressors is large. Two, it can be used with any prior with a density function that can be evaluated up to a normalizing constant, making it ideal for investigating the properties of new shrinkage priors without having to develop custom sampling algorithms. The new sampler takes advantage of the special structure of the linear regression likelihood, allowing it to produce better effective sample size per second than common alternative approaches.
△ Less
Submitted 14 June, 2018;
originally announced June 2018.
-
Regret-based Selection for Sparse Dynamic Portfolios
Authors:
David Puelz,
P. Richard Hahn,
Carlos Carvalho
Abstract:
This paper considers portfolio construction in a dynamic setting. We specify a loss function comprised of utility and complexity components with an unknown tradeoff parameter. We develop a novel regret-based criterion for selecting the tradeoff parameter to construct optimal sparse portfolios over time.
This paper considers portfolio construction in a dynamic setting. We specify a loss function comprised of utility and complexity components with an unknown tradeoff parameter. We develop a novel regret-based criterion for selecting the tradeoff parameter to construct optimal sparse portfolios over time.
△ Less
Submitted 23 July, 2017; v1 submitted 30 June, 2017;
originally announced June 2017.
-
Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects
Authors:
P. Richard Hahn,
Jared S. Murray,
Carlos Carvalho
Abstract:
This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects from observational data, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects.…
▽ More
This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects from observational data, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects. First, they can yield badly biased estimates of treatment effects when fit to data with strong confounding. The Bayesian causal forest model presented in this paper avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. Second, standard approaches to response surface modeling do not provide adequate control over the strength of regularization over effect heterogeneity. The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively "shrink to homogeneity". We illustrate these benefits via the reanalysis of an observational study assessing the causal effects of smoking on medical expenditures as well as extensive simulation studies.
△ Less
Submitted 13 November, 2019; v1 submitted 28 June, 2017;
originally announced June 2017.
-
Variable Selection in Seemingly Unrelated Regressions with Random Predictors
Authors:
David Puelz,
P. Richard Hahn,
Carlos Carvalho
Abstract:
This paper considers linear model selection when the response is vector-valued and the predictors are randomly observed. We propose a new approach that decouples statistical inference from the selection step in a "post-inference model summarization" strategy. We study the impact of predictor uncertainty on the model selection procedure. The method is demonstrated through an application to asset pr…
▽ More
This paper considers linear model selection when the response is vector-valued and the predictors are randomly observed. We propose a new approach that decouples statistical inference from the selection step in a "post-inference model summarization" strategy. We study the impact of predictor uncertainty on the model selection procedure. The method is demonstrated through an application to asset pricing.
△ Less
Submitted 3 June, 2016; v1 submitted 29 May, 2016;
originally announced May 2016.
-
Regularization and confounding in linear regression for treatment effect estimation
Authors:
P. Richard Hahn,
Carlos M. Carvalho,
Jingyu He,
David Puelz
Abstract:
This paper investigates the use of regularization priors in the context of treatment effect estimation using observational data where the number of control variables is large relative to the number of observations. First, the phenomenon of regularization-induced confounding is introduced, which refers to the tendency of regularization priors to adversely bias treatment effect estimates by over-shr…
▽ More
This paper investigates the use of regularization priors in the context of treatment effect estimation using observational data where the number of control variables is large relative to the number of observations. First, the phenomenon of regularization-induced confounding is introduced, which refers to the tendency of regularization priors to adversely bias treatment effect estimates by over-shrinking control variable regression coefficients. Then, a simultaneous regression model is presented which permits regularization priors to be specified in a way that avoids this unintentional re-confounding. The new model is illustrated on synthetic and empirical data.
△ Less
Submitted 27 December, 2016; v1 submitted 5 February, 2016;
originally announced February 2016.
-
Optimal ETF Selection for Passive Investing
Authors:
David Puelz,
Carlos M. Carvalho,
P. Richard Hahn
Abstract:
This paper considers the problem of isolating a small number of exchange traded funds (ETFs) that suffice to capture the fundamental dimensions of variation in U.S. financial markets. First, the data is fit to a vector-valued Bayesian regression model, which is a matrix-variate generalization of the well known stochastic search variable selection (SSVS) of George and McCulloch (1993). ETF selectio…
▽ More
This paper considers the problem of isolating a small number of exchange traded funds (ETFs) that suffice to capture the fundamental dimensions of variation in U.S. financial markets. First, the data is fit to a vector-valued Bayesian regression model, which is a matrix-variate generalization of the well known stochastic search variable selection (SSVS) of George and McCulloch (1993). ETF selection is then performed using the decoupled shrinkage and selection (DSS) procedure described in Hahn and Carvalho (2015), adapted in two ways: to the vector-response setting and to incorporate stochastic covariates. The selected set of ETFs is obtained under a number of different penalty and modeling choices. Optimal portfolios are constructed from selected ETFs by maximizing the Sharpe ratio posterior mean, and they are compared to the (unknown) optimal portfolio based on the full Bayesian model. We compare our selection results to popular ETF advisor Wealthfront.com. Additionally, we consider selecting ETFs by modeling a large set of mutual funds.
△ Less
Submitted 28 November, 2015; v1 submitted 12 October, 2015;
originally announced October 2015.
-
On recursive Bayesian predictive distributions
Authors:
P. Richard Hahn,
Ryan Martin,
Stephen G. Walker
Abstract:
A Bayesian framework is attractive in the context of prediction, but a fast recursive update of the predictive distribution has apparently been out of reach, in part because Monte Carlo methods are generally used to compute the predictive. This paper shows that online Bayesian prediction is possible by characterizing the Bayesian predictive update in terms of a bivariate copula, making it unnecess…
▽ More
A Bayesian framework is attractive in the context of prediction, but a fast recursive update of the predictive distribution has apparently been out of reach, in part because Monte Carlo methods are generally used to compute the predictive. This paper shows that online Bayesian prediction is possible by characterizing the Bayesian predictive update in terms of a bivariate copula, making it unnecessary to pass through the posterior to update the predictive. In standard models, the Bayesian predictive update corresponds to familiar choices of copula but, in nonparametric problems, the appropriate copula may not have a closed-form expression. In such cases, our new perspective suggests a fast recursive approximation to the predictive density, in the spirit of Newton's predictive recursion algorithm, but without requiring evaluation of normalizing constants. Consistency of the new algorithm is shown, and numerical examples demonstrate its quality performance in finite-samples compared to fully Bayesian and kernel methods.
△ Less
Submitted 30 April, 2017; v1 submitted 29 August, 2015;
originally announced August 2015.
-
Model specification via sequential coherence and backward induction
Authors:
P. Richard Hahn
Abstract:
This paper describes how to specify probability models for data analysis via a backward induction procedure. The new approach yields coherent, prior-free uncertainty assessment. After presenting some intuition-building examples, the new approach is applied to a kernel density estimator, which leads to a novel method for computing point-wise credible intervals in nonparametric density estimation. T…
▽ More
This paper describes how to specify probability models for data analysis via a backward induction procedure. The new approach yields coherent, prior-free uncertainty assessment. After presenting some intuition-building examples, the new approach is applied to a kernel density estimator, which leads to a novel method for computing point-wise credible intervals in nonparametric density estimation. The new approach has two additional advantages; 1) the posterior mean density can be accurately approximated without resorting to Monte Carlo simulation and 2) concentration bounds are easily established as a function of sample size.
△ Less
Submitted 20 February, 2015;
originally announced February 2015.
-
A Bayesian hierarchical model for inferring player strategy types in a number guessing game
Authors:
P. Richard Hahn,
Indranil Goswami,
Carl Mela
Abstract:
This paper presents an in-depth statistical analysis of an experiment designed to measure the extent to which players in a simple game behave according to a popular behavioral economic model. The p-beauty contest is a multi-player number guessing game that has been widely used to study strategic behavior. This paper describes beauty contest experiments for an audience of data analysts, with a spec…
▽ More
This paper presents an in-depth statistical analysis of an experiment designed to measure the extent to which players in a simple game behave according to a popular behavioral economic model. The p-beauty contest is a multi-player number guessing game that has been widely used to study strategic behavior. This paper describes beauty contest experiments for an audience of data analysts, with a special focus on a class of models for game play called k-step thinking models, which allow each player in the game to employ an idiosyncratic strategy. We fit a Bayesian statistical model to estimate the proportion of our player population whose game play is compatible with a k-step thinking model. Our findings put this number at approximately 25%.
△ Less
Submitted 16 September, 2014;
originally announced September 2014.
-
Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective
Authors:
P. Richard Hahn,
Carlos M. Carvalho
Abstract:
Selecting a subset of variables for linear models remains an active area of research. This paper reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.
Selecting a subset of variables for linear models remains an active area of research. This paper reviews many of the recent contributions to the Bayesian model selection and shrinkage prior literature. A posterior variable selection summary is proposed, which distills a full posterior distribution over regression coefficients into a sequence of sparse linear predictors.
△ Less
Submitted 3 August, 2014;
originally announced August 2014.
-
Shrinkage priors for linear instrumental variable models with many instruments
Authors:
P. Richard Hahn,
Hedibert Lopes
Abstract:
This paper addresses the weak instruments problem in linear instrumental variable models from a Bayesian perspective. The new approach has two components. First, a novel predictor-dependent shrinkage prior is developed for the many instruments setting. The prior is constructed based on a factor model decomposition of the matrix of observed instruments, allowing many instruments to be incorporated…
▽ More
This paper addresses the weak instruments problem in linear instrumental variable models from a Bayesian perspective. The new approach has two components. First, a novel predictor-dependent shrinkage prior is developed for the many instruments setting. The prior is constructed based on a factor model decomposition of the matrix of observed instruments, allowing many instruments to be incorporated into the analysis in a robust way.
Second, the new prior is implemented via an importance sampling scheme, which utilizes posterior Monte Carlo samples from a first-stage Bayesian regression analysis. This modular computation makes sensitivity analyses straightforward.
Two simulation studies are provided to demonstrate the advantages of the new method. As an empirical illustration, the new method is used to estimate a key parameter in macro-economic models: the elasticity of inter-temporal substitution. The empirical analysis produces substantive conclusions in line with previous studies, but certain inconsistencies of earlier analyses are resolved.
△ Less
Submitted 3 August, 2014;
originally announced August 2014.
-
A Bayesian partial identification approach to inferring the prevalence of accounting misconduct
Authors:
P. Richard Hahn,
Jared S. Murray,
Ioanna Manolopoulou
Abstract:
This paper describes the use of flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data is available - inferr…
▽ More
This paper describes the use of flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data is available - inferring the prevalence of accounting misconduct among publicly traded U.S. businesses.
△ Less
Submitted 6 March, 2015; v1 submitted 31 July, 2014;
originally announced July 2014.
-
A Structural Approach to Coordinate-Free Statistics
Authors:
Tom LaGatta,
P. Richard Hahn
Abstract:
We consider the question of learning in general topological vector spaces. By exploiting known (or parametrized) covariance structures, our Main Theorem demonstrates that any continuous linear map corresponds to a certain isomorphism of embedded Hilbert spaces. By inverting this isomorphism and extending continuously, we construct a version of the Ordinary Least Squares estimator in absolute gener…
▽ More
We consider the question of learning in general topological vector spaces. By exploiting known (or parametrized) covariance structures, our Main Theorem demonstrates that any continuous linear map corresponds to a certain isomorphism of embedded Hilbert spaces. By inverting this isomorphism and extending continuously, we construct a version of the Ordinary Least Squares estimator in absolute generality. Our Gauss-Markov theorem demonstrates that OLS is a "best linear unbiased estimator", extending the classical result. We construct a stochastic version of the OLS estimator, which is a continuous disintegration exactly for the class of "uncorrelated implies independent" (UII) measures. As a consequence, Gaussian measures always exhibit continuous disintegrations through continuous linear maps, extending a theorem of the first author. Applying this framework to some problems in machine learning, we prove a useful representation theorem for covariance tensors, and show that OLS defines a good kriging predictor for vector-valued arrays on general index spaces. We also construct a support-vector machine classifier in this setting. We hope that our article shines light on some deeper connections between probability theory, statistics and machine learning, and may serve as a point of intersection for these three communities.
△ Less
Submitted 5 May, 2014; v1 submitted 1 May, 2014;
originally announced May 2014.
-
Predictor-dependent shrinkage for linear regression via partial factor modeling
Authors:
P. Richard Hahn,
Sayan Mukherjee,
Carlos Carvalho
Abstract:
In prediction problems with more predictors than observations, it can sometimes be helpful to use a joint probability model, $π(Y,X)$, rather than a purely conditional model, $π(Y \mid X)$, where $Y$ is a scalar response variable and $X$ is a vector of predictors. This approach is motivated by the fact that in many situations the marginal predictor distribution $π(X)$ can provide useful informatio…
▽ More
In prediction problems with more predictors than observations, it can sometimes be helpful to use a joint probability model, $π(Y,X)$, rather than a purely conditional model, $π(Y \mid X)$, where $Y$ is a scalar response variable and $X$ is a vector of predictors. This approach is motivated by the fact that in many situations the marginal predictor distribution $π(X)$ can provide useful information about the parameter values governing the conditional regression. However, under very mild misspecification, this marginal distribution can also lead conditional inferences astray. Here, we explore these ideas in the context of linear factor models, to understand how they play out in a familiar setting. The resulting Bayesian model performs well across a wide range of covariance structures, on real and simulated data.
△ Less
Submitted 16 November, 2010;
originally announced November 2010.