Search | arXiv e-print repository

Robust Semiparametric Inference for Bayesian Additive Regression Trees

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

Abstract: We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we es… ▽ More We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART. △ Less

Submitted 29 September, 2025; originally announced September 2025.

arXiv:2412.04605 [pdf, ps, other]

Semiparametric Bayesian Difference-in-Differences

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

Abstract: This paper studies semiparametric Bayesian inference for the average treatment effect on the treated (ATT) within the difference-in-differences (DiD) research design. We propose two new Bayesian methods with frequentist validity. The first one places a standard Gaussian process prior on the conditional mean function of the control group. The second method is a double robust Bayesian procedure that… ▽ More This paper studies semiparametric Bayesian inference for the average treatment effect on the treated (ATT) within the difference-in-differences (DiD) research design. We propose two new Bayesian methods with frequentist validity. The first one places a standard Gaussian process prior on the conditional mean function of the control group. The second method is a double robust Bayesian procedure that adjusts the prior distribution of the conditional mean function and subsequently corrects the posterior distribution of the resulting ATT. We prove new semiparametric Bernstein-von Mises (BvM) theorems for both proposals. Monte Carlo simulations and an empirical application demonstrate that the proposed Bayesian DiD methods exhibit strong finite-sample performance compared to existing frequentist methods. We also present extensions of the canonical DiD approach, incorporating both the staggered design and the repeated cross-sectional design. △ Less

Submitted 15 June, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

arXiv:2211.16298 [pdf, ps, other]

Double Robust Bayesian Inference on Average Treatment Effects

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

Abstract: We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. For our new Bayesian approach, we first adjust the prior distributions of the conditional mean functions, and then correct the posterior distribution of the resulting ATE. Both adjustments make use of pilot estimators motivated by the semiparametric influence function for ATE estim… ▽ More We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. For our new Bayesian approach, we first adjust the prior distributions of the conditional mean functions, and then correct the posterior distribution of the resulting ATE. Both adjustments make use of pilot estimators motivated by the semiparametric influence function for ATE estimation. We prove asymptotic equivalence of our Bayesian procedure and efficient frequentist ATE estimators by establishing a new semiparametric Bernstein-von Mises theorem under double robustness; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa. Consequently, the resulting Bayesian credible sets form confidence intervals with asymptotically exact coverage probability. In simulations, our method provides precise point estimates of the ATE through the posterior mean and credible intervals that closely align with the nominal coverage probability. Furthermore, our approach achieves a shorter interval length in comparison to existing methods. We illustrate our method in an application to the National Supported Work Demonstration following LaLonde [1986] and Dehejia and Wahba [1999]. △ Less

Submitted 25 February, 2025; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2107.05936 [pdf, other]

Testability of Reverse Causality Without Exogenous Variation

Authors: Christoph Breunig, Patrick Burauel

Abstract: This paper shows that testability of reverse causality is possible even in the absence of exogenous variation, such as in the form of instrumental variables. Instead of relying on exogenous variation, we achieve testability by imposing relatively weak model restrictions and exploiting that a dependence of residual and purported cause is informative about the causal direction. Our main assumption i… ▽ More This paper shows that testability of reverse causality is possible even in the absence of exogenous variation, such as in the form of instrumental variables. Instead of relying on exogenous variation, we achieve testability by imposing relatively weak model restrictions and exploiting that a dependence of residual and purported cause is informative about the causal direction. Our main assumption is that the true functional relationship is nonlinear and that error terms are additively separable. We extend previous results by incorporating control variables and allowing heteroskedastic errors. We build on reproducing kernel Hilbert space (RKHS) embeddings of probability distributions to test conditional independence and demonstrate the efficacy in detecting the causal direction in both Monte Carlo simulations and an application to German survey data. △ Less

Submitted 26 April, 2024; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2101.12282 [pdf, ps, other]

Simple Adaptive Estimation of Quadratic Functionals in Nonparametric IV Models

Authors: Christoph Breunig, Xiaohong Chen

Abstract: This paper considers adaptive, minimax estimation of a quadratic functional in a nonparametric instrumental variables (NPIV) model, which is an important problem in optimal estimation of a nonlinear functional of an ill-posed inverse regression with an unknown operator. We first show that a leave-one-out, sieve NPIV estimator of the quadratic functional can attain a convergence rate that coincides… ▽ More This paper considers adaptive, minimax estimation of a quadratic functional in a nonparametric instrumental variables (NPIV) model, which is an important problem in optimal estimation of a nonlinear functional of an ill-posed inverse regression with an unknown operator. We first show that a leave-one-out, sieve NPIV estimator of the quadratic functional can attain a convergence rate that coincides with the lower bound previously derived in Chen and Christensen [2018]. The minimax rate is achieved by the optimal choice of the sieve dimension (a key tuning parameter) that depends on the smoothness of the NPIV function and the degree of ill-posedness, both are unknown in practice. We next propose a Lepski-type data-driven choice of the key sieve dimension adaptive to the unknown NPIV model features. The adaptive estimator of the quadratic functional is shown to attain the minimax optimal rate in the severely ill-posed case and in the regular mildly ill-posed case, but up to a multiplicative $\sqrt{\log n}$ factor in the irregular mildly ill-posed case. △ Less

Submitted 8 February, 2022; v1 submitted 28 January, 2021; originally announced January 2021.

arXiv:2009.12665 [pdf, other]

Nonclassical Measurement Error in the Outcome Variable

Authors: Christoph Breunig, Stephan Martin

Abstract: We study a semi-/nonparametric regression model with a general form of nonclassical measurement error in the outcome variable. We show equivalence of this model to a generalized regression model. Our main identifying assumptions are a special regressor type restriction and monotonicity in the nonlinear relationship between the observed and unobserved true outcome. Nonparametric identification is t… ▽ More We study a semi-/nonparametric regression model with a general form of nonclassical measurement error in the outcome variable. We show equivalence of this model to a generalized regression model. Our main identifying assumptions are a special regressor type restriction and monotonicity in the nonlinear relationship between the observed and unobserved true outcome. Nonparametric identification is then obtained under a normalization of the unknown link function, which is a natural extension of the classical measurement error case. We propose a novel sieve rank estimator for the regression function and establish its rate of convergence. In Monte Carlo simulations, we find that our estimator corrects for biases induced by nonclassical measurement error and provides numerically stable results. We apply our method to analyze belief formation of stock market expectations with survey data from the German Socio-Economic Panel (SOEP) and find evidence for nonclassical measurement error in subjective belief data. △ Less

Submitted 30 May, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

arXiv:2006.09587 [pdf, other]

Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models

Authors: Christoph Breunig, Xiaohong Chen

Abstract: We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model. Our test statistic is based on a modified leave-one-out sample analog of a quadratic distance between the restricted and unrestricted sieve two-stage least squares estim… ▽ More We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model. Our test statistic is based on a modified leave-one-out sample analog of a quadratic distance between the restricted and unrestricted sieve two-stage least squares estimators. We provide computationally simple, data-driven choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical values. Our test adapts to the unknown smoothness of alternative functions in the presence of unknown degree of endogeneity and unknown strength of the instruments. It attains the adaptive minimax rate of testing in $L^{2}$. That is, the sum of the supremum of type I error over the composite null and the supremum of type II error over nonparametric alternative models cannot be minimized by any other tests for NPIV models of unknown regularities. Confidence sets in $L^{2}$ are obtained by inverting the adaptive test. Simulations confirm that, across different strength of instruments and sample sizes, our adaptive test controls size and its finite-sample power greatly exceeds existing non-adaptive tests for monotonicity and parametric restrictions in NPIV models. Empirical applications to test for shape restrictions of differentiated products demand and of Engel curves are presented. △ Less

Submitted 6 November, 2024; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:1909.10133 [pdf, other]

doi 10.1016/j.jeconom.2014.09.006

Goodness-of-Fit Tests based on Series Estimators in Nonparametric Instrumental Regression

Authors: Christoph Breunig

Abstract: This paper proposes several tests of restricted specification in nonparametric instrumental regression. Based on series estimators, test statistics are established that allow for tests of the general model against a parametric or nonparametric specification as well as a test of exogeneity of the vector of regressors. The tests' asymptotic distributions under correct specification are derived and t… ▽ More This paper proposes several tests of restricted specification in nonparametric instrumental regression. Based on series estimators, test statistics are established that allow for tests of the general model against a parametric or nonparametric specification as well as a test of exogeneity of the vector of regressors. The tests' asymptotic distributions under correct specification are derived and their consistency against any alternative model is shown. Under a sequence of local alternative hypotheses, the asymptotic distributions of the tests is derived. Moreover, uniform consistency is established over a class of alternatives whose distance to the null hypothesis shrinks appropriately as the sample size increases. A Monte Carlo study examines finite sample performance of the test statistics. △ Less

Submitted 22 September, 2019; originally announced September 2019.

Journal ref: Journal of Econometrics, Volume 184, Issue 2, Feb 2015, Pages 328-346

arXiv:1909.10129 [pdf, other]

doi 10.1017/S0266466619000288

Specification Testing in Nonparametric Instrumental Quantile Regression

Authors: Christoph Breunig

Abstract: There are many environments in econometrics which require nonseparable modeling of a structural disturbance. In a nonseparable model with endogenous regressors, key conditions are validity of instrumental variables and monotonicity of the model in a scalar unobservable variable. Under these conditions the nonseparable model is equivalent to an instrumental quantile regression model. A failure of t… ▽ More There are many environments in econometrics which require nonseparable modeling of a structural disturbance. In a nonseparable model with endogenous regressors, key conditions are validity of instrumental variables and monotonicity of the model in a scalar unobservable variable. Under these conditions the nonseparable model is equivalent to an instrumental quantile regression model. A failure of the key conditions, however, makes instrumental quantile regression potentially inconsistent. This paper develops a methodology for testing the hypothesis whether the instrumental quantile regression model is correctly specified. Our test statistic is asymptotically normally distributed under correct specification and consistent against any alternative model. In addition, test statistics to justify the model simplification are established. Finite sample properties are examined in a Monte Carlo study and an empirical illustration is provided. △ Less

Submitted 22 September, 2019; originally announced September 2019.

Journal ref: Econom. Theory 36 (2020) 583-625

arXiv:1810.00411 [pdf, other]

Nonparametric Regression with Selectively Missing Covariates

Authors: Christoph Breunig, Peter Haan

Abstract: We consider the problem of regression with selectively observed covariates in a nonparametric framework. Our approach relies on instrumental variables that explain variation in the latent covariates but have no direct effect on selection. The regression function of interest is shown to be a weighted version of observed conditional expectation where the weighting function is a fraction of selection… ▽ More We consider the problem of regression with selectively observed covariates in a nonparametric framework. Our approach relies on instrumental variables that explain variation in the latent covariates but have no direct effect on selection. The regression function of interest is shown to be a weighted version of observed conditional expectation where the weighting function is a fraction of selection probabilities. Nonparametric identification of the fractional probability weight (FPW) function is achieved via a partial completeness assumption. We provide primitive functional form assumptions for partial completeness to hold. The identification result is constructive for the FPW series estimator. We derive the rate of convergence and also the pointwise asymptotic distribution. In both cases, the asymptotic performance of the FPW series estimator does not suffer from the inverse problem which derives from the nonparametric instrumental variable approach. In a Monte Carlo study, we analyze the finite sample properties of our estimator and we compare our approach to inverse probability weighting, which can be used alternatively for unconditional moment estimation. In the empirical application, we focus on two different applications. We estimate the association between income and health using linked data from the SHARE survey and administrative pension information and use pension entitlements as an instrument. In the second application we revisit the question how income affects the demand for housing based on data from the German Socio-Economic Panel Study (SOEP). In this application we use regional income information on the residential block level as an instrument. In both applications we show that income is selectively missing and we demonstrate that standard methods that do not account for the nonrandom selection process lead to significantly biased estimates for individuals with low income. △ Less

Submitted 13 October, 2020; v1 submitted 30 September, 2018; originally announced October 2018.

arXiv:1806.00666 [pdf, other]

Ill-posed Estimation in High-Dimensional Models with Instrumental Variables

Authors: Christoph Breunig, Enno Mammen, Anna Simoni

Abstract: This paper is concerned with inference about low-dimensional components of a high-dimensional parameter vector $β^0$ which is identified through instrumental variables. We allow for eigenvalues of the expected outer product of included and excluded covariates, denoted by $M$, to shrink to zero as the sample size increases. We propose a novel estimator based on desparsification of an instrumental v… ▽ More This paper is concerned with inference about low-dimensional components of a high-dimensional parameter vector $β^0$ which is identified through instrumental variables. We allow for eigenvalues of the expected outer product of included and excluded covariates, denoted by $M$, to shrink to zero as the sample size increases. We propose a novel estimator based on desparsification of an instrumental variable Lasso estimator, which is a regularized version of 2SLS with an additional correction term. This estimator converges to $β^0$ at a rate depending on the mapping properties of $M$ captured by a sparse link condition. Linear combinations of our estimator of $β^0$ are shown to be asymptotically normally distributed. Based on consistent covariance estimation, our method allows for constructing confidence intervals and statistical tests for single or low-dimensional components of $β^0$. In Monte-Carlo simulations we analyze the finite sample behavior of our estimator. △ Less

Submitted 3 August, 2020; v1 submitted 2 June, 2018; originally announced June 2018.

arXiv:1804.03110 [pdf, other]

Varying Random Coefficient Models

Authors: Christoph Breunig

Abstract: This paper provides a new methodology to analyze unobserved heterogeneity when observed characteristics are modeled nonlinearly. The proposed model builds on varying random coefficients (VRC) that are determined by nonlinear functions of observed regressors and additively separable unobservables. This paper proposes a novel estimator of the VRC density based on weighted sieve minimum distance. The… ▽ More This paper provides a new methodology to analyze unobserved heterogeneity when observed characteristics are modeled nonlinearly. The proposed model builds on varying random coefficients (VRC) that are determined by nonlinear functions of observed regressors and additively separable unobservables. This paper proposes a novel estimator of the VRC density based on weighted sieve minimum distance. The main example of sieve bases are Hermite functions which yield a numerically stable estimation procedure. This paper shows inference results that go beyond what has been shown in ordinary RC models. We provide in each case rates of convergence and also establish pointwise limit theory of linear functionals, where a prominent example is the density of potential outcomes. In addition, a multiplier bootstrap procedure is proposed to construct uniform confidence bands. A Monte Carlo study examines finite sample properties of the estimator and shows that it performs well even when the regressors associated to RC are far from being heavy tailed. Finally, the methodology is applied to analyze heterogeneity in income elasticity of demand for housing. △ Less

Submitted 3 August, 2020; v1 submitted 9 April, 2018; originally announced April 2018.

Showing 1–12 of 12 results for author: Breunig, C