-
The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation
Authors:
Matias D. Cattaneo,
Jason M. Klusowski,
Ruiqi Rae Yu
Abstract:
Recursive decision trees have emerged as a leading methodology for heterogeneous causal treatment effect estimation and inference in experimental and observational settings. These procedures are fitted using the celebrated CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or custom variants thereof, and hence are believed to be "adaptive" to high-dimensional data, sparsit…
▽ More
Recursive decision trees have emerged as a leading methodology for heterogeneous causal treatment effect estimation and inference in experimental and observational settings. These procedures are fitted using the celebrated CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or custom variants thereof, and hence are believed to be "adaptive" to high-dimensional data, sparsity, or other specific features of the underlying data generating process. Athey and Imbens [2016] proposed several "honest" causal decision tree estimators, which have become the standard in both academia and industry. We study their estimators, and variants thereof, and establish lower bounds on their estimation error. We demonstrate that these popular heterogeneous treatment effect estimators cannot achieve a polynomial-in-$n$ convergence rate under basic conditions, where $n$ denotes the sample size. Contrary to common belief, honesty does not resolve these limitations and at best delivers negligible logarithmic improvements in sample size or dimension. As a result, these commonly used estimators can exhibit poor performance in practice, and even be inconsistent in some settings. Our theoretical insights are empirically validated through simulations.
△ Less
Submitted 14 September, 2025;
originally announced September 2025.
-
Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis
Authors:
Matias D. Cattaneo,
Boris Shigida
Abstract:
We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss do…
▽ More
We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $β\in (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (finite "time" horizon) approximation bounds $O(h^{R})$ for any finite order $R \geq 2$. We then conduct a fine-grained analysis of the combinatorics underlying the memoryless approximations of HB, in particular, finding a rich family of polynomials in $β$ hidden inside which contains Eulerian and Narayana polynomials. We derive continuous modified equations of arbitrary approximation order (with rigorous bounds) and the principal flow that approximates the HB dynamics, generalizing Rosca et al. (2023). Approximation theorems cover both full-batch and mini-batch HB. Our theoretical results shed new light on the main features of gradient descent with heavy-ball momentum, and outline a road-map for similar analysis of other optimization algorithms.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
The Regression Discontinuity Design in Medical Science
Authors:
Matias D. Cattaneo,
Rocio Titiunik
Abstract:
This article provides an introduction to the Regression Discontinuity (RD) design, and its application to empirical research in the medical sciences. While the main focus of this article is on causal interpretation, key concepts of estimation and inference are also briefly mentioned. A running medical empirical example is provided.
This article provides an introduction to the Regression Discontinuity (RD) design, and its application to empirical research in the medical sciences. While the main focus of this article is on causal interpretation, key concepts of estimation and inference are also briefly mentioned. A running medical empirical example is provided.
△ Less
Submitted 5 August, 2025;
originally announced August 2025.
-
Leveraging Covariates in Regression Discontinuity Designs
Authors:
Matias D. Cattaneo,
Filippo Palomba
Abstract:
It is common practice to incorporate additional covariates in empirical economics. In the context of Regression Discontinuity (RD) designs, covariate adjustment plays multiple roles, making it essential to understand its impact on analysis and conclusions. Typically implemented via local least squares regressions, covariate adjustment can serve three main distinct purposes: (i) improving the effic…
▽ More
It is common practice to incorporate additional covariates in empirical economics. In the context of Regression Discontinuity (RD) designs, covariate adjustment plays multiple roles, making it essential to understand its impact on analysis and conclusions. Typically implemented via local least squares regressions, covariate adjustment can serve three main distinct purposes: (i) improving the efficiency of RD average causal effect estimators, (ii) learning about heterogeneous RD policy effects, and (iii) changing the RD parameter of interest. This article discusses and illustrates empirically how to leverage covariates effectively in RD designs.
△ Less
Submitted 18 July, 2025;
originally announced July 2025.
-
rd2d: Causal Inference in Boundary Discontinuity Designs
Authors:
Matias D. Cattaneo,
Rocio Titiunik,
Ruiqi Rae Yu
Abstract:
Boundary discontinuity designs -- also known as Multi-Score Regression Discontinuity (RD) designs, with Geographic RD designs as a prominent example -- are often used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. This article introduces the R package rd2d, which implements and extends the methodological results de…
▽ More
Boundary discontinuity designs -- also known as Multi-Score Regression Discontinuity (RD) designs, with Geographic RD designs as a prominent example -- are often used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. This article introduces the R package rd2d, which implements and extends the methodological results developed in Cattaneo, Titiunik and Yu (2025) for boundary discontinuity designs. The package employs local polynomial estimation and inference using either the bivariate score or a univariate distance-to-boundary metric. It features novel data-driven bandwidth selection procedures, and offers both pointwise and uniform estimation and inference along the assignment boundary. The numerical performance of the package is demonstrated through a simulation study.
△ Less
Submitted 10 June, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
Estimation and Inference in Boundary Discontinuity Designs
Authors:
Matias D. Cattaneo,
Rocio Titiunik,
Ruiqi Rae Yu
Abstract:
Boundary Discontinuity Designs are used to learn about treatment effects along a continuous boundary that splits units into control and treatment groups according to a bivariate score variable. These research designs are also called Multi-Score Regression Discontinuity Designs, a leading special case being Geographic Regression Discontinuity Designs. We study the statistical properties of commonly…
▽ More
Boundary Discontinuity Designs are used to learn about treatment effects along a continuous boundary that splits units into control and treatment groups according to a bivariate score variable. These research designs are also called Multi-Score Regression Discontinuity Designs, a leading special case being Geographic Regression Discontinuity Designs. We study the statistical properties of commonly used local polynomial treatment effects estimators along the continuous treatment assignment boundary. We consider two distinct approaches: one based explicitly on the bivariate score variable for each unit, and the other based on their univariate distance to the boundary. For each approach, we present pointwise and uniform estimation and inference methods for the treatment effect function over the assignment boundary. Notably, we show that methods based on univariate distance to the boundary exhibit an irreducible large misspecification bias when the assignment boundary has kinks or other irregularities, making the distance-based approach unsuitable for empirical work in those settings. In contrast, methods based on the bivariate score variable do not suffer from that drawback. We illustrate our methods with an empirical application. Companion general-purpose software is provided.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Treatment Effect Heterogeneity in Regression Discontinuity Designs
Authors:
Sebastian Calonico,
Matias D. Cattaneo,
Max H. Farrell,
Filippo Palomba,
Rocio Titiunik
Abstract:
Empirical studies using Regression Discontinuity (RD) designs often explore heterogeneous treatment effects based on pretreatment covariates, even though no formal statistical methods exist for such analyses. This has led to the widespread use of ad hoc approaches in applications. Motivated by common empirical practice, we develop a unified, theoretically grounded framework for RD heterogeneity an…
▽ More
Empirical studies using Regression Discontinuity (RD) designs often explore heterogeneous treatment effects based on pretreatment covariates, even though no formal statistical methods exist for such analyses. This has led to the widespread use of ad hoc approaches in applications. Motivated by common empirical practice, we develop a unified, theoretically grounded framework for RD heterogeneity analysis. We show that a fully interacted local linear (in functional parameters) model effectively captures heterogeneity while still being tractable and interpretable in applications. The model structure holds without loss of generality for discrete covariates. Although our proposed model is potentially restrictive for continuous covariates, it naturally aligns with standard empirical practice and offers a causal interpretation for RD applications. We establish principled bandwidth selection and robust bias-corrected inference methods to analyze heterogeneous treatment effects and test group differences. We provide companion software to facilitate implementation of our results. An empirical application illustrates the practical relevance of our methods.
△ Less
Submitted 3 July, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Robust Inference for the Direct Average Treatment Effect with Treatment Assignment Interference
Authors:
Matias D. Cattaneo,
Yihan He,
Ruiqi,
Yu
Abstract:
This paper develops methods for uncertainty quantification in causal inference settings with random network interference. We study the large-sample distributional properties of the classical difference-in-means Hajek treatment effect estimator, and propose a robust inference procedure for the (conditional) direct average treatment effect. Our framework allows for cross-unit interference in both th…
▽ More
This paper develops methods for uncertainty quantification in causal inference settings with random network interference. We study the large-sample distributional properties of the classical difference-in-means Hajek treatment effect estimator, and propose a robust inference procedure for the (conditional) direct average treatment effect. Our framework allows for cross-unit interference in both the outcome equation and the treatment assignment mechanism. Drawing from statistical physics, we introduce a novel Ising model to capture complex dependencies in treatment assignment, and derive three results. First, we establish a Berry-Esseen-type distributional approximation that holds pointwise in the degree of interference induced by the Ising model. This approximation recovers existing results in the absence of treatment interference, and highlights the fragility of inference procedures that do not account for the presence of interference in treatment assignment. Second, we establish a uniform distributional approximation for the Hajek estimator and use it to develop robust inference procedures that remain valid uniformly over all interference regimes allowed by the model. Third, we propose a novel resampling method to implement the robust inference procedure and validate its performance through Monte Carlo simulations. A key technical innovation is the introduction of a conditional i.i.d. Gaussianization that may have broader applications. We also discuss extensions and generalizations of our results.
△ Less
Submitted 26 June, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
How Memory in Optimization Algorithms Implicitly Modifies the Loss
Authors:
Matias D. Cattaneo,
Boris Shigida
Abstract:
In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient descent with momentum has exponentially decaying memory through exponentially averaged past gradients. We introduce a general technique for identifying a memoryle…
▽ More
In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient descent with momentum has exponentially decaying memory through exponentially averaged past gradients. We introduce a general technique for identifying a memoryless algorithm that approximates an optimization algorithm with memory. It is obtained by replacing all past iterates in the update by the current one, and then adding a correction term arising from memory (also a function of the current iterate). This correction term can be interpreted as a perturbation of the loss, and the nature of this perturbation can inform how memory implicitly (anti-)regularizes the optimization dynamics. As an application of our theory, we find that Lion does not have the kind of implicit anti-regularization induced by memory that AdamW does, providing a theory-based explanation for Lion's better generalization performance recently documented.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Randomization Inference for Before-and-After Studies with Multiple Units: An Application to a Criminal Procedure Reform in Uruguay
Authors:
Matias D. Cattaneo,
Carlos Diaz,
Rocio Titiunik
Abstract:
Learning about the immediate causal effects of large-scale policy interventions poses a significant challenge for quasi-experimental methods that rely on long-term trends or parametric modeling assumptions. As an alternative, we develop a randomization inference framework for before-and-after studies with multiple units, designed specifically for short-term causal inference and allowing for genera…
▽ More
Learning about the immediate causal effects of large-scale policy interventions poses a significant challenge for quasi-experimental methods that rely on long-term trends or parametric modeling assumptions. As an alternative, we develop a randomization inference framework for before-and-after studies with multiple units, designed specifically for short-term causal inference and allowing for general assignment mechanisms. The method provides finite-sample-valid statistical inferences without relying on parametric time series models or extrapolation. We demonstrate its utility by analyzing a major criminal justice reform in Uruguay that switched from an inquisitorial to an adversarial system in November 2017. Our method relies on the key assumption of no local time trends near the policy adoption time, which is supported by several falsification tests in our empirical study. We find a statistically significant short-term causal effect: an increase of approximately 25 daily police reports (an 8% rise) in the first week of the new justice system. Our randomization inference framework provides a robust and flexible methodology for evaluating policy adoptions in before-and-after studies with multiple units.
△ Less
Submitted 8 August, 2025; v1 submitted 20 October, 2024;
originally announced October 2024.
-
Nonlinear Binscatter Methods
Authors:
Matias D. Cattaneo,
Richard K. Crump,
Max H. Farrell,
Yingjie Feng
Abstract:
Binned scatter plots are a powerful statistical tool for empirical work in the social, behavioral, and biomedical sciences. Available methods rely on a quantile-based partitioning estimator of the conditional mean regression function to primarily construct flexible yet interpretable visualization methods, but they can also be used to estimate treatment effects, assess uncertainty, and test substan…
▽ More
Binned scatter plots are a powerful statistical tool for empirical work in the social, behavioral, and biomedical sciences. Available methods rely on a quantile-based partitioning estimator of the conditional mean regression function to primarily construct flexible yet interpretable visualization methods, but they can also be used to estimate treatment effects, assess uncertainty, and test substantive domain-specific hypotheses. This paper introduces novel binscatter methods based on nonlinear, possibly nonsmooth M-estimation methods, covering generalized linear, robust, and quantile regression models. We provide a host of theoretical results and practical tools for local constant estimation along with piecewise polynomial and spline approximations, including (i) optimal tuning parameter (number of bins) selection, (ii) confidence bands, and (iii) formal statistical tests regarding functional form or shape restrictions. Our main results rely on novel strong approximations for general partitioning-based estimators covering random, data-driven partitions, which may be of independent interest. We demonstrate our methods with an empirical application studying the relation between the percentage of individuals without health insurance and per capita income at the zip-code level. We provide general-purpose software packages implementing our methods in Python, R, and Stata.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Strong Approximations for Empirical Processes Indexed by Lipschitz Functions
Authors:
Matias D. Cattaneo,
Ruiqi Rae Yu
Abstract:
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on $d$-variate random vectors ($d\geq1$). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by possibly Lipschitz functions, improving on previous results in the literature. In the setting considered by Rio (1994), and if t…
▽ More
This paper presents new uniform Gaussian strong approximations for empirical processes indexed by classes of functions based on $d$-variate random vectors ($d\geq1$). First, a uniform Gaussian strong approximation is established for general empirical processes indexed by possibly Lipschitz functions, improving on previous results in the literature. In the setting considered by Rio (1994), and if the function class is Lipschitzian, our result improves the approximation rate $n^{-1/(2d)}$ to $n^{-1/\max\{d,2\}}$, up to a $\operatorname{polylog}(n)$ term, where $n$ denotes the sample size. Remarkably, we establish a valid uniform Gaussian strong approximation at the rate $n^{-1/2}\log n$ for $d=2$, which was previously known to be valid only for univariate ($d=1$) empirical processes via the celebrated Hungarian construction (Komlós et al., 1975). Second, a uniform Gaussian strong approximation is established for multiplicative separable empirical processes indexed by possibly Lipschitz functions, which addresses some outstanding problems in the literature (Chernozhukov et al., 2014, Section 3). Finally, two other uniform Gaussian strong approximation results are presented when the function class is a sequence of Haar basis based on quasi-uniform partitions. Applications to nonparametric density and regression estimation are discussed.
△ Less
Submitted 12 November, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Protocols for Observational Studies: An Application to Regression Discontinuity Designs
Authors:
Matias D. Cattaneo,
Rocio Titiunik
Abstract:
In his 2022 IMS Medallion Lecture delivered at the Joint Statistical Meetings, Prof. Dylan S. Small eloquently advocated for the use of protocols in observational studies. We discuss his proposal and, inspired by his ideas, we develop a protocol for the regression discontinuity design.
In his 2022 IMS Medallion Lecture delivered at the Joint Statistical Meetings, Prof. Dylan S. Small eloquently advocated for the use of protocols in observational studies. We discuss his proposal and, inspired by his ideas, we develop a protocol for the regression discontinuity design.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Inference with Mondrian Random Forests
Authors:
Matias D. Cattaneo,
Jason M. Klusowski,
William G. Underwood
Abstract:
Random forests are popular methods for regression and classification analysis, and many different variants have been proposed in recent years. One interesting example is the Mondrian random forest, in which the underlying constituent trees are constructed via a Mondrian process. We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondr…
▽ More
Random forests are popular methods for regression and classification analysis, and many different variants have been proposed in recent years. One interesting example is the Mondrian random forest, in which the underlying constituent trees are constructed via a Mondrian process. We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator. By combining these results with a carefully crafted debiasing approach and an accurate variance estimator, we present valid statistical inference methods for the unknown regression function. These methods come with explicitly characterized error bounds in terms of the sample size, tree complexity parameter, and number of trees in the forest, and include coverage error rates for feasible confidence interval estimators. Our novel debiasing procedure for the Mondrian random forest also allows it to achieve the minimax-optimal point estimation convergence rate in mean squared error for multivariate $β$-Hölder regression functions, for all $β> 0$, provided that the underlying tuning parameters are chosen appropriately. Efficient and implementable algorithms are devised for both batch and online learning settings, and we study the computational complexity of different Mondrian random forest implementations. Finally, simulations with synthetic data validate our theory and methodology, demonstrating their excellent finite-sample properties.
△ Less
Submitted 8 April, 2025; v1 submitted 14 October, 2023;
originally announced October 2023.
-
On the Implicit Bias of Adam
Authors:
Matias D. Cattaneo,
Jason M. Klusowski,
Boris Shigida
Abstract:
In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their…
▽ More
In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, conversely, impede its reduction (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.
△ Less
Submitted 16 June, 2024; v1 submitted 31 August, 2023;
originally announced September 2023.
-
Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)
Authors:
Matias D. Cattaneo,
Xinwei Ma,
Yusufcan Masatlioglu
Abstract:
Barseghyan and Molinari (2023) give sufficient conditions for semi-nonparametric point identification of parameters of interest in a mixture model of decision-making under risk, allowing for unobserved heterogeneity in utility functions and limited consideration. A key assumption in the model is that the heterogeneity of risk preferences is unobservable but context-independent. In this comment, we…
▽ More
Barseghyan and Molinari (2023) give sufficient conditions for semi-nonparametric point identification of parameters of interest in a mixture model of decision-making under risk, allowing for unobserved heterogeneity in utility functions and limited consideration. A key assumption in the model is that the heterogeneity of risk preferences is unobservable but context-independent. In this comment, we build on their insights and present identification results in a setting where the risk preferences are allowed to be context-dependent.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Bootstrap-Assisted Inference for Generalized Grenander-type Estimators
Authors:
Matias D. Cattaneo,
Michael Jansson,
Kenichi Nagasawa
Abstract:
Westling and Carone (2020) proposed a framework for studying the large sample distributional properties of generalized Grenander-type estimators, a versatile class of nonparametric estimators of monotone functions. The limiting distribution of those estimators is representable as the left derivative of the greatest convex minorant of a Gaussian process whose monomial mean can be of unknown order (…
▽ More
Westling and Carone (2020) proposed a framework for studying the large sample distributional properties of generalized Grenander-type estimators, a versatile class of nonparametric estimators of monotone functions. The limiting distribution of those estimators is representable as the left derivative of the greatest convex minorant of a Gaussian process whose monomial mean can be of unknown order (when the degree of flatness of the function of interest is unknown). The standard nonparametric bootstrap is unable to consistently approximate the large sample distribution of the generalized Grenander-type estimators even if the monomial order of the mean is known, making statistical inference a challenging endeavour in applications. To address this inferential problem, we present a bootstrap-assisted inference procedure for generalized Grenander-type estimators. The procedure relies on a carefully crafted, yet automatic, transformation of the estimator. Moreover, our proposed method can be made ``flatness robust'' in the sense that it can be made adaptive to the (possibly unknown) degree of flatness of the function of interest. The method requires only the consistent estimation of a single scalar quantity, for which we propose an automatic procedure based on numerical derivative estimation and the generalized jackknife. Under random sampling, our inference method can be implemented using a computationally attractive exchangeable bootstrap procedure. We illustrate our methods with examples and we also provide a small simulation study. The development of formal results is made possible by some technical results that may be of independent interest.
△ Less
Submitted 4 July, 2024; v1 submitted 23 March, 2023;
originally announced March 2023.
-
A Guide to Regression Discontinuity Designs in Medical Applications
Authors:
Matias D. Cattaneo,
Luke Keele,
Rocio Titiunik
Abstract:
We present a practical guide for the analysis of regression discontinuity (RD) designs in biomedical contexts. We begin by introducing key concepts, assumptions, and estimands within both the continuity-based framework and the local randomization framework. We then discuss modern estimation and inference methods within both frameworks, including approaches for bandwidth or local neighborhood selec…
▽ More
We present a practical guide for the analysis of regression discontinuity (RD) designs in biomedical contexts. We begin by introducing key concepts, assumptions, and estimands within both the continuity-based framework and the local randomization framework. We then discuss modern estimation and inference methods within both frameworks, including approaches for bandwidth or local neighborhood selection, optimal treatment effect point estimation, and robust bias-corrected inference methods for uncertainty quantification. We also overview empirical falsification tests that can be used to support key assumptions. Our discussion focuses on two particular features that are relevant in biomedical research: (i) fuzzy RD designs, which often arise when therapeutic treatments are based on clinical guidelines but patients with scores near the cutoff are treated contrary to the assignment rule; and (ii) RD designs with discrete scores, which are ubiquitous in biomedical applications. We illustrate our discussion with three empirical applications: the effect of CD4 guidelines for anti-retroviral therapy on retention of HIV patients in South Africa, the effect of genetic guidelines for chemotherapy on breast cancer recurrence in the United States, and the effects of age-based patient cost-sharing on healthcare utilization in Taiwan. We provide replication materials employing publicly available statistical software in Python, R and Stata, offering researchers all necessary tools to conduct an RD analysis.
△ Less
Submitted 16 May, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
A Practical Introduction to Regression Discontinuity Designs: Extensions
Authors:
Matias D. Cattaneo,
Nicolas Idrobo,
Rocio Titiunik
Abstract:
This monograph, together with its accompanying first part Cattaneo, Idrobo and Titiunik (2020), collects and expands the instructional materials we prepared for more than $50$ short courses and workshops on Regression Discontinuity (RD) methodology that we taught between 2014 and 2023. In this second monograph, we discuss several topics in RD methodology that build on and extend the analysis of RD…
▽ More
This monograph, together with its accompanying first part Cattaneo, Idrobo and Titiunik (2020), collects and expands the instructional materials we prepared for more than $50$ short courses and workshops on Regression Discontinuity (RD) methodology that we taught between 2014 and 2023. In this second monograph, we discuss several topics in RD methodology that build on and extend the analysis of RD designs introduced in Cattaneo, Idrobo and Titiunik (2020). Our first goal is to present an alternative RD conceptual framework based on local randomization ideas. This methodological approach can be useful in RD designs with discretely-valued scores, and can also be used more broadly as a complement to the continuity-based approach in other settings. Then, employing both continuity-based and local randomization approaches, we extend the canonical Sharp RD design in multiple directions: fuzzy RD designs, RD designs with discrete scores, and multi-dimensional RD designs. The goal of our two-part monograph is purposely practical and hence we focus on the empirical analysis of RD designs.
△ Less
Submitted 25 March, 2024; v1 submitted 21 January, 2023;
originally announced January 2023.
-
Higher-order Refinements of Small Bandwidth Asymptotics for Density-Weighted Average Derivative Estimators
Authors:
Matias D. Cattaneo,
Max H. Farrell,
Michael Jansson,
Ricardo Masini
Abstract:
The density weighted average derivative (DWAD) of a regression function is a canonical parameter of interest in economics. Classical first-order large sample distribution theory for kernel-based DWAD estimators relies on tuning parameter restrictions and model assumptions that imply an asymptotic linear representation of the point estimator. These conditions can be restrictive, and the resulting d…
▽ More
The density weighted average derivative (DWAD) of a regression function is a canonical parameter of interest in economics. Classical first-order large sample distribution theory for kernel-based DWAD estimators relies on tuning parameter restrictions and model assumptions that imply an asymptotic linear representation of the point estimator. These conditions can be restrictive, and the resulting distributional approximation may not be representative of the actual sampling distribution of the statistic of interest. In particular, the approximation is not robust to bandwidth choice. Small bandwidth asymptotics offers an alternative, more general distributional approximation for kernel-based DWAD estimators that allows for, but does not require, asymptotic linearity. The resulting inference procedures based on small bandwidth asymptotics were found to exhibit superior finite sample performance in simulations, but no formal theory justifying that empirical success is available in the literature. Employing Edgeworth expansions, this paper shows that small bandwidth asymptotic approximations lead to inference procedures with higher-order distributional properties that are demonstrably superior to those of procedures based on asymptotic linear approximations.
△ Less
Submitted 15 February, 2024; v1 submitted 31 December, 2022;
originally announced January 2023.
-
On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation
Authors:
Matias D. Cattaneo,
Jason M. Klusowski,
Peter M. Tian
Abstract:
Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (train…
▽ More
Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm with non-vanishing probability, even with pruning. Instead, the convergence may be arbitrarily slow or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.
△ Less
Submitted 6 February, 2024; v1 submitted 19 November, 2022;
originally announced November 2022.
-
Convergence Rates of Oblique Regression Trees for Flexible Function Libraries
Authors:
Matias D. Cattaneo,
Rajita Chandak,
Jason M. Klusowski
Abstract:
We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate). While this methodology has garnered significant attention from the computer science and optimization communities since th…
▽ More
We develop a theoretical framework for the analysis of oblique decision trees, where the splits at each decision node occur at linear combinations of the covariates (as opposed to conventional tree constructions that force axis-aligned splits involving only a single covariate). While this methodology has garnered significant attention from the computer science and optimization communities since the mid-80s, the advantages they offer over their axis-aligned counterparts remain only empirically justified, and explanations for their success are largely based on heuristics. Filling this long-standing gap between theory and practice, we show that oblique regression trees (constructed by recursively minimizing squared error) satisfy a type of oracle inequality and can adapt to a rich library of regression models consisting of linear combinations of ridge functions and their limit points. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, one need not always trade-off interpretability with accuracy. Specifically, we show that, under suitable conditions, oblique decision trees achieve similar predictive accuracy as neural networks for the same library of regression models. To address the combinatorial complexity of finding the optimal splitting hyperplane at each decision node, our proposed theoretical framework can accommodate many existing computational tools in the literature. Our results rely on (arguably surprising) connections between recursive adaptive partitioning and sequential greedy approximation algorithms for convex optimization problems (e.g., orthogonal greedy algorithms), which may be of independent theoretical interest. Using our theory and methods, we also study oblique random forests.
△ Less
Submitted 30 August, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption
Authors:
Matias D. Cattaneo,
Yingjie Feng,
Filippo Palomba,
Rocio Titiunik
Abstract:
We propose principled prediction intervals to quantify the uncertainty of a large class of synthetic control predictions (or estimators) in settings with staggered treatment adoption, offering precise non-asymptotic coverage probability guarantees. From a methodological perspective, we provide a detailed discussion of different causal quantities to be predicted, which we call causal predictands, a…
▽ More
We propose principled prediction intervals to quantify the uncertainty of a large class of synthetic control predictions (or estimators) in settings with staggered treatment adoption, offering precise non-asymptotic coverage probability guarantees. From a methodological perspective, we provide a detailed discussion of different causal quantities to be predicted, which we call causal predictands, allowing for multiple treated units with treatment adoption at possibly different points in time. From a theoretical perspective, our uncertainty quantification methods improve on prior literature by (i) covering a large class of causal predictands in staggered adoption settings, (ii) allowing for synthetic control methods with possibly nonlinear constraints, (iii) proposing scalable robust conic optimization methods and principled data-driven tuning parameter selection, and (iv) offering valid uniform inference across post-treatment periods. We illustrate our methodology with an empirical application studying the effects of economic liberalization on real GDP per capita for Sub-Saharan African countries. Companion software packages are provided in Python, R, and Stata.
△ Less
Submitted 1 February, 2025; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Yurinskii's Coupling for Martingales
Authors:
Matias D. Cattaneo,
Ricardo P. Masini,
William G. Underwood
Abstract:
Yurinskii's coupling is a popular theoretical tool for non-asymptotic distributional analysis in mathematical statistics and applied probability, offering a Gaussian strong approximation with an explicit error bound under easily verifiable conditions. Originally stated in $\ell_2$-norm for sums of independent random vectors, it has recently been extended both to the $\ell_p$-norm, for…
▽ More
Yurinskii's coupling is a popular theoretical tool for non-asymptotic distributional analysis in mathematical statistics and applied probability, offering a Gaussian strong approximation with an explicit error bound under easily verifiable conditions. Originally stated in $\ell_2$-norm for sums of independent random vectors, it has recently been extended both to the $\ell_p$-norm, for $1 \leq p \leq \infty$, and to vector-valued martingales in $\ell_2$-norm, under some strong conditions. We present as our main result a Yurinskii coupling for approximate martingales in $\ell_p$-norm, under substantially weaker conditions than those previously imposed. Our formulation further allows for the coupling variable to follow a more general Gaussian mixture distribution, and we provide a novel third-order coupling method which gives tighter approximations in certain settings. We specialize our main result to mixingales, martingales, and independent data, and derive uniform Gaussian mixture strong approximations for martingale empirical processes. Applications to nonparametric partitioning-based and local polynomial regression procedures are provided, alongside central limit theorems for high-dimensional martingale vectors.
△ Less
Submitted 4 August, 2025; v1 submitted 1 October, 2022;
originally announced October 2022.
-
lpcde: Estimation and Inference for Local Polynomial Conditional Density Estimators
Authors:
Matias D. Cattaneo,
Rajita Chandak,
Michael Jansson,
Xinwei Ma
Abstract:
This paper discusses the R package lpcde, which stands for local polynomial conditional density estimation. It implements the kernel-based local polynomial smoothing methods introduced in Cattaneo, Chandak, Jansson, Ma (2024) for statistical estimation and inference of conditional distributions, densities, and derivatives thereof. The package offers mean square error optimal bandwidth selection an…
▽ More
This paper discusses the R package lpcde, which stands for local polynomial conditional density estimation. It implements the kernel-based local polynomial smoothing methods introduced in Cattaneo, Chandak, Jansson, Ma (2024) for statistical estimation and inference of conditional distributions, densities, and derivatives thereof. The package offers mean square error optimal bandwidth selection and associated point estimators, as well as uncertainty quantification based on robust bias correction both pointwise (e.g., confidence intervals) and uniformly (e.g., confidence bands) over evaluation points. The methods implemented are boundary adaptive whenever the data is compactly supported. The package also implements regularized conditional density estimation methods, ensuring the resulting density estimate is non-negative and integrates to one. We contrast the functionalities of lpcde with existing open-source packages for conditional density estimation, and showcase its main features using simulated and real datasets. An abbreviated version of this article is published in Cattaneo, Chandak, Jansson, Ma (2025 JOSS).
△ Less
Submitted 7 March, 2025; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Boundary Adaptive Local Polynomial Conditional Density Estimators
Authors:
Matias D. Cattaneo,
Rajita Chandak,
Michael Jansson,
Xinwei Ma
Abstract:
We begin by introducing a class of conditional density estimators based on local polynomial techniques. The estimators are boundary adaptive and easy to implement. We then study the (pointwise and) uniform statistical properties of the estimators, offering characterizations of both probability concentration and distributional approximation. In particular, we establish uniform convergence rates in…
▽ More
We begin by introducing a class of conditional density estimators based on local polynomial techniques. The estimators are boundary adaptive and easy to implement. We then study the (pointwise and) uniform statistical properties of the estimators, offering characterizations of both probability concentration and distributional approximation. In particular, we establish uniform convergence rates in probability and valid Gaussian distributional approximations for the Studentized t-statistic process. We also discuss implementation issues such as consistent estimation of the covariance function for the Gaussian approximation, optimal integrated mean squared error bandwidth selection, and valid robust bias-corrected inference. We illustrate the applicability of our results by constructing valid confidence bands and hypothesis tests for both parametric specification and shape constraints, explicitly characterizing their approximation errors. A companion R software package implementing our main results is provided.
△ Less
Submitted 17 December, 2023; v1 submitted 21 April, 2022;
originally announced April 2022.
-
scpi: Uncertainty Quantification for Synthetic Control Methods
Authors:
Matias D. Cattaneo,
Yingjie Feng,
Filippo Palomba,
Rocio Titiunik
Abstract:
The synthetic control method offers a way to quantify the effect of an intervention using weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. This method is useful for program evaluation and causal inference in observational studies. We introduce the software package scpi for prediction a…
▽ More
The synthetic control method offers a way to quantify the effect of an intervention using weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. This method is useful for program evaluation and causal inference in observational studies. We introduce the software package scpi for prediction and inference using synthetic controls, implemented in Python, R, and Stata. For point estimation or prediction of treatment effects, the package offers an array of (possibly penalized) approaches leveraging the latest optimization methods. For uncertainty quantification, the package offers the prediction interval methods introduced by Cattaneo, Feng and Titiunik (2021) and Cattaneo, Feng, Palomba and Titiunik (2022). The paper includes numerical illustrations and a comparison with other synthetic control software.
△ Less
Submitted 11 October, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Uniform Inference for Kernel Density Estimators with Dyadic Data
Authors:
Matias D. Cattaneo,
Yingjie Feng,
William G. Underwood
Abstract:
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main c…
▽ More
Dyadic data is often encountered when quantities of interest are associated with the edges of a network. As such it plays an important role in statistics, econometrics and many other data science disciplines. We consider the problem of uniformly estimating a dyadic Lebesgue density function, focusing on nonparametric kernel-based estimators taking the form of dyadic empirical processes. Our main contributions include the minimax-optimal uniform convergence rate of the dyadic kernel density estimator, along with strong approximation results for the associated standardized and Studentized $t$-processes. A consistent variance estimator enables the construction of valid and feasible uniform confidence bands for the unknown density function. We showcase the broad applicability of our results by developing novel counterfactual density estimation and inference methodology for dyadic data, which can be used for causal inference and program evaluation. A crucial feature of dyadic distributions is that they may be "degenerate" at certain points in the support of the data, a property making our analysis somewhat delicate. Nonetheless our methods for uniform inference remain robust to the potential presence of such points. For implementation purposes, we discuss inference procedures based on positive semi-definite covariance estimators, mean squared error optimal bandwidth selectors and robust bias correction techniques. We illustrate the empirical finite-sample performance of our methods both in simulations and with real-world trade data, for which we make comparisons between observed and counterfactual trade distributions in different years. Our technical results concerning strong approximations and maximal inequalities are of potential independent interest.
△ Less
Submitted 13 October, 2023; v1 submitted 15 January, 2022;
originally announced January 2022.
-
Covariate Adjustment in Regression Discontinuity Designs
Authors:
Matias D. Cattaneo,
Luke Keele,
Rocio Titiunik
Abstract:
The Regression Discontinuity (RD) design is a widely used non-experimental method for causal inference and program evaluation. While its canonical formulation only requires a score and an outcome variable, it is common in empirical work to encounter RD analyses where additional variables are used for adjustment. This practice has led to misconceptions about the role of covariate adjustment in RD a…
▽ More
The Regression Discontinuity (RD) design is a widely used non-experimental method for causal inference and program evaluation. While its canonical formulation only requires a score and an outcome variable, it is common in empirical work to encounter RD analyses where additional variables are used for adjustment. This practice has led to misconceptions about the role of covariate adjustment in RD analysis, from both methodological and empirical perspectives. In this chapter, we review the different roles of covariate adjustment in RD designs, and offer methodological guidance for its correct use.
△ Less
Submitted 24 August, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Regression Discontinuity Designs
Authors:
Matias D. Cattaneo,
Rocio Titiunik
Abstract:
The Regression Discontinuity (RD) design is one of the most widely used non-experimental methods for causal inference and program evaluation. Over the last two decades, statistical and econometric methods for RD analysis have expanded and matured, and there is now a large number of methodological results for RD identification, estimation, inference, and validation. We offer a curated review of thi…
▽ More
The Regression Discontinuity (RD) design is one of the most widely used non-experimental methods for causal inference and program evaluation. Over the last two decades, statistical and econometric methods for RD analysis have expanded and matured, and there is now a large number of methodological results for RD identification, estimation, inference, and validation. We offer a curated review of this methodological literature organized around the two most popular frameworks for the analysis and interpretation of RD designs: the continuity framework and the local randomization framework. For each framework, we discuss three main topics: (i) designs and parameters, which focuses on different types of RD settings and treatment effects of interest; (ii) estimation and inference, which presents the most popular methods based on local polynomial regression and analysis of experiments, as well as refinements, extensions, and alternatives; and (iii) validation and falsification, which summarizes an array of mostly empirical approaches to support the validity of RD designs in practice.
△ Less
Submitted 24 February, 2022; v1 submitted 20 August, 2021;
originally announced August 2021.
-
Local Regression Distribution Estimators
Authors:
Matias D. Cattaneo,
Michael Jansson,
Xinwei Ma
Abstract:
This paper investigates the large sample properties of local regression distribution estimators, which include a class of boundary adaptive density estimators as a prime example. First, we establish a pointwise Gaussian large sample distributional approximation in a unified way, allowing for both boundary and interior evaluation points simultaneously. Using this result, we study the asymptotic eff…
▽ More
This paper investigates the large sample properties of local regression distribution estimators, which include a class of boundary adaptive density estimators as a prime example. First, we establish a pointwise Gaussian large sample distributional approximation in a unified way, allowing for both boundary and interior evaluation points simultaneously. Using this result, we study the asymptotic efficiency of the estimators, and show that a carefully crafted minimum distance implementation based on "redundant" regressors can lead to efficiency gains. Second, we establish uniform linearizations and strong approximations for the estimators, and employ these results to construct valid confidence bands. Third, we develop extensions to weighted distributions with estimated weights and to local $L^{2}$ least squares estimation. Finally, we illustrate our methods with two applications in program evaluation: counterfactual density testing, and IV specification and heterogeneity density analysis. Companion software packages in Stata and R are available.
△ Less
Submitted 28 January, 2021; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Analysis of Regression Discontinuity Designs with Multiple Cutoffs or Multiple Scores
Authors:
Matias D. Cattaneo,
Rocio Titiunik,
Gonzalo Vazquez-Bare
Abstract:
We introduce the \texttt{Stata} (and \texttt{R}) package \texttt{rdmulti}, which includes three commands (\texttt{rdmc}, \texttt{rdmcplot}, \texttt{rdms}) for analyzing Regression Discontinuity (RD) designs with multiple cutoffs or multiple scores. The command \texttt{rdmc} applies to non-cumulative and cumulative multi-cutoff RD settings. It calculates pooled and cutoff-specific RD treatment effe…
▽ More
We introduce the \texttt{Stata} (and \texttt{R}) package \texttt{rdmulti}, which includes three commands (\texttt{rdmc}, \texttt{rdmcplot}, \texttt{rdms}) for analyzing Regression Discontinuity (RD) designs with multiple cutoffs or multiple scores. The command \texttt{rdmc} applies to non-cumulative and cumulative multi-cutoff RD settings. It calculates pooled and cutoff-specific RD treatment effects, and provides robust bias-corrected inference procedures. Post estimation and inference is allowed. The command \texttt{rdmcplot} offers RD plots for multi-cutoff settings. Finally, the command \texttt{rdms} concerns multi-score settings, covering in particular cumulative cutoffs and two running variables contexts. It also calculates pooled and cutoff-specific RD treatment effects, provides robust bias-corrected inference procedures, and allows for post-estimation estimation and inference. These commands employ the \texttt{Stata} (and \texttt{R}) package \texttt{rdrobust} for plotting, estimation, and inference. Companion \texttt{R} functions with the same syntax and capabilities are provided.
△ Less
Submitted 25 April, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Prediction Intervals for Synthetic Control Methods
Authors:
Matias D. Cattaneo,
Yingjie Feng,
Rocio Titiunik
Abstract:
Uncertainty quantification is a fundamental problem in the analysis and interpretation of synthetic control (SC) methods. We develop conditional prediction intervals in the SC framework, and provide conditions under which these intervals offer finite-sample probability guarantees. Our method allows for covariate adjustment and non-stationary data. The construction begins by noting that the statist…
▽ More
Uncertainty quantification is a fundamental problem in the analysis and interpretation of synthetic control (SC) methods. We develop conditional prediction intervals in the SC framework, and provide conditions under which these intervals offer finite-sample probability guarantees. Our method allows for covariate adjustment and non-stationary data. The construction begins by noting that the statistical uncertainty of the SC prediction is governed by two distinct sources of randomness: one coming from the construction of the (likely misspecified) SC weights in the pre-treatment period, and the other coming from the unobservable stochastic error in the post-treatment period when the treatment effect is analyzed. Accordingly, our proposed prediction intervals are constructed taking into account both sources of randomness. For implementation, we propose a simulation-based approach along with finite-sample-based probability bound arguments, naturally leading to principled sensitivity analysis methods. We illustrate the numerical performance of our methods using empirical applications and a small simulation study. \texttt{Python}, \texttt{R} and \texttt{Stata} software packages implementing our methodology are available.
△ Less
Submitted 7 September, 2021; v1 submitted 15 December, 2019;
originally announced December 2019.
-
A Practical Introduction to Regression Discontinuity Designs: Foundations
Authors:
Matias D. Cattaneo,
Nicolas Idrobo,
Rocio Titiunik
Abstract:
In this Element and its accompanying Element, Matias D. Cattaneo, Nicolas Idrobo, and Rocio Titiunik provide an accessible and practical guide for the analysis and interpretation of Regression Discontinuity (RD) designs that encourages the use of a common set of practices and facilitates the accumulation of RD-based empirical evidence. In this Element, the authors discuss the foundations of the ca…
▽ More
In this Element and its accompanying Element, Matias D. Cattaneo, Nicolas Idrobo, and Rocio Titiunik provide an accessible and practical guide for the analysis and interpretation of Regression Discontinuity (RD) designs that encourages the use of a common set of practices and facilitates the accumulation of RD-based empirical evidence. In this Element, the authors discuss the foundations of the canonical Sharp RD design, which has the following features: (i) the score is continuously distributed and has only one dimension, (ii) there is only one cutoff, and (iii) compliance with the treatment assignment is perfect. In the accompanying Element, the authors discuss practical and conceptual extensions to the basic RD setup.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
lpdensity: Local Polynomial Density Estimation and Inference
Authors:
Matias D. Cattaneo,
Michael Jansson,
Xinwei Ma
Abstract:
Density estimation and inference methods are widely used in empirical work. When the underlying distribution has compact support, conventional kernel-based density estimators are no longer consistent near or at the boundary because of their well-known boundary bias. Alternative smoothing methods are available to handle boundary points in density estimation, but they all require additional tuning p…
▽ More
Density estimation and inference methods are widely used in empirical work. When the underlying distribution has compact support, conventional kernel-based density estimators are no longer consistent near or at the boundary because of their well-known boundary bias. Alternative smoothing methods are available to handle boundary points in density estimation, but they all require additional tuning parameter choices or other typically ad hoc modifications depending on the evaluation point and/or approach considered. This article discusses the R and Stata package lpdensity implementing a novel local polynomial density estimator proposed and studied in Cattaneo, Jansson, and Ma (2020, 2021), which is boundary adaptive and involves only one tuning parameter. The methods implemented also cover local polynomial estimation of the cumulative distribution function and density derivatives. In addition to point estimation and graphical procedures, the package offers consistent variance estimators, mean squared error optimal bandwidth selection, robust bias-corrected inference, and confidence bands construction, among other features. A comparison with other density estimation packages available in R using a Monte Carlo experiment is provided.
△ Less
Submitted 22 February, 2021; v1 submitted 15 June, 2019;
originally announced June 2019.
-
The Regression Discontinuity Design
Authors:
Matias D. Cattaneo,
Rocio Titiunik,
Gonzalo Vazquez-Bare
Abstract:
This handbook chapter gives an introduction to the sharp regression discontinuity design, covering identification, estimation, inference, and falsification methods.
This handbook chapter gives an introduction to the sharp regression discontinuity design, covering identification, estimation, inference, and falsification methods.
△ Less
Submitted 1 June, 2020; v1 submitted 10 June, 2019;
originally announced June 2019.
-
lspartition: Partitioning-Based Least Squares Regression
Authors:
Matias D. Cattaneo,
Max H. Farrell,
Yingjie Feng
Abstract:
Nonparametric partitioning-based least squares regression is an important tool in empirical work. Common examples include regressions based on splines, wavelets, and piecewise polynomials. This article discusses the main methodological and numerical features of the R software package lspartition, which implements modern estimation and inference results for partitioning-based least squares (series)…
▽ More
Nonparametric partitioning-based least squares regression is an important tool in empirical work. Common examples include regressions based on splines, wavelets, and piecewise polynomials. This article discusses the main methodological and numerical features of the R software package lspartition, which implements modern estimation and inference results for partitioning-based least squares (series) regression estimation. This article discusses the main methodological and numerical features of the R software package lspartition, which implements results for partitioning-based least squares (series) regression estimation and inference from Cattaneo and Farrell (2013) and Cattaneo, Farrell, and Feng (2019). These results cover the multivariate regression function as well as its derivatives. First, the package provides data-driven methods to choose the number of partition knots optimally, according to integrated mean squared error, yielding optimal point estimation. Second, robust bias correction is implemented to combine this point estimator with valid inference. Third, the package provides estimates and inference for the unknown function both pointwise and uniformly in the conditioning variables. In particular, valid confidence bands are provided. Finally, an extension to two-sample analysis is developed, which can be used in treatment-control comparisons and related problems
△ Less
Submitted 8 August, 2019; v1 submitted 1 June, 2019;
originally announced June 2019.
-
nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference
Authors:
Sebastian Calonico,
Matias D. Cattaneo,
Max H. Farrell
Abstract:
Nonparametric kernel density and local polynomial regression estimators are very popular in Statistics, Economics, and many other disciplines. They are routinely employed in applied work, either as part of the main empirical analysis or as a preliminary ingredient entering some other estimation or inference procedure. This article describes the main methodological and numerical features of the sof…
▽ More
Nonparametric kernel density and local polynomial regression estimators are very popular in Statistics, Economics, and many other disciplines. They are routinely employed in applied work, either as part of the main empirical analysis or as a preliminary ingredient entering some other estimation or inference procedure. This article describes the main methodological and numerical features of the software package nprobust, which offers an array of estimation and inference procedures for nonparametric kernel-based density and local polynomial regression methods, implemented in both the R and Stata statistical platforms. The package includes not only classical bandwidth selection, estimation, and inference methods (Wand and Jones, 1995; Fan and Gijbels, 1996), but also other recent developments in the statistics and econometrics literatures such as robust bias-corrected inference and coverage error optimal bandwidth selection (Calonico, Cattaneo and Farrell, 2018, 2019). Furthermore, this article also proposes a simple way of estimating optimal bandwidths in practice that always delivers the optimal mean square error convergence rate regardless of the specific evaluation point, that is, no matter whether it is implemented at a boundary or interior point. Numerical performance is illustrated using an empirical application and simulated data, where a detailed numerical comparison with other R packages is given.
△ Less
Submitted 1 June, 2019;
originally announced June 2019.
-
Binscatter Regressions
Authors:
Matias D. Cattaneo,
Richard K. Crump,
Max H. Farrell,
Yingjie Feng
Abstract:
We introduce the package Binsreg, which implements the binscatter methods developed by Cattaneo, Crump, Farrell, and Feng (2024b,a). The package includes seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest, binspwc, and binsregselect. The first four commands implement binscatter plotting, point estimation, and uncertainty quantification (confidence intervals and confidence bands) fo…
▽ More
We introduce the package Binsreg, which implements the binscatter methods developed by Cattaneo, Crump, Farrell, and Feng (2024b,a). The package includes seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest, binspwc, and binsregselect. The first four commands implement binscatter plotting, point estimation, and uncertainty quantification (confidence intervals and confidence bands) for least squares linear binscatter regression (binsreg) and for nonlinear binscatter regression (binslogit for Logit regression, binsprobit for Probit regression, and binsqreg for quantile regression). The next two commands focus on pointwise and uniform inference: binstest implements hypothesis testing procedures for parametric specifications and for nonparametric shape restrictions of the unknown regression function, while binspwc implements multi-group pairwise statistical comparisons. Finally, the command binsregselect implements data-driven number of bins selectors. The commands offer binned scatter plots, and allow for covariate adjustment, weighting, clustering, and multi-sample analysis, which is useful when studying treatment effect heterogeneity in randomized and observational studies, among many other features.
△ Less
Submitted 24 July, 2024; v1 submitted 25 February, 2019;
originally announced February 2019.
-
On Binscatter
Authors:
Matias D. Cattaneo,
Richard K. Crump,
Max H. Farrell,
Yingjie Feng
Abstract:
Binscatter is a popular method for visualizing bivariate relationships and conducting informal specification testing. We study the properties of this method formally and develop enhanced visualization and econometric binscatter tools. These include estimating conditional means with optimal binning and quantifying uncertainty. We also highlight a methodological problem related to covariate adjustme…
▽ More
Binscatter is a popular method for visualizing bivariate relationships and conducting informal specification testing. We study the properties of this method formally and develop enhanced visualization and econometric binscatter tools. These include estimating conditional means with optimal binning and quantifying uncertainty. We also highlight a methodological problem related to covariate adjustment that can yield incorrect conclusions. We revisit two applications using our methodology and find substantially different results relative to those obtained using prior informal binscatter methods. General purpose software in Python, R, and Stata is provided. Our technical work is of independent interest for the nonparametric partition-based estimation literature.
△ Less
Submitted 30 April, 2024; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Simple Local Polynomial Density Estimators
Authors:
Matias D. Cattaneo,
Michael Jansson,
Xinwei Ma
Abstract:
This paper introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require pre-binning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth sele…
▽ More
This paper introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require pre-binning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth selection methods. As a substantive application of our results, we develop a novel discontinuity in density testing procedure, an important problem in regression discontinuity designs and other program evaluation settings. An illustrative empirical application is given. Two companion Stata and R software packages are provided.
△ Less
Submitted 7 June, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Regression Discontinuity Designs Using Covariates
Authors:
Sebastian Calonico,
Matias D. Cattaneo,
Max H. Farrell,
Rocio Titiunik
Abstract:
We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditio…
▽ More
We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditions, and characterize the potential for estimation and inference improvements. We also present new covariate-adjusted mean squared error expansions and robust bias-corrected inference procedures, with heteroskedasticity-consistent and cluster-robust standard errors. An empirical illustration and an extensive simulation study is presented. All methods are implemented in \texttt{R} and \texttt{Stata} software packages.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Characteristic-Sorted Portfolios: Estimation and Inference
Authors:
Matias D. Cattaneo,
Richard K. Crump,
Max H. Farrell,
Ernst Schaumburg
Abstract:
Portfolio sorting is ubiquitous in the empirical finance literature, where it has been widely used to identify pricing anomalies. Despite its popularity, little attention has been paid to the statistical properties of the procedure. We develop a general framework for portfolio sorting by casting it as a nonparametric estimator. We present valid asymptotic inference methods and a valid mean square…
▽ More
Portfolio sorting is ubiquitous in the empirical finance literature, where it has been widely used to identify pricing anomalies. Despite its popularity, little attention has been paid to the statistical properties of the procedure. We develop a general framework for portfolio sorting by casting it as a nonparametric estimator. We present valid asymptotic inference methods and a valid mean square error expansion of the estimator leading to an optimal choice for the number of portfolios. In practical settings, the optimal choice may be much larger than the standard choices of 5 or 10. To illustrate the relevance of our results, we revisit the size and momentum anomalies.
△ Less
Submitted 5 October, 2019; v1 submitted 10 September, 2018;
originally announced September 2018.
-
Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs
Authors:
Sebastian Calonico,
Matias D. Cattaneo,
Max H. Farrell
Abstract:
Modern empirical work in Regression Discontinuity (RD) designs often employs local polynomial estimation and inference with a mean square error (MSE) optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment effect estimator, but is by construction invalid for inference. Robust bias corrected (RBC) inference methods are valid when using the MSE-optimal bandwidth, but we show they…
▽ More
Modern empirical work in Regression Discontinuity (RD) designs often employs local polynomial estimation and inference with a mean square error (MSE) optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment effect estimator, but is by construction invalid for inference. Robust bias corrected (RBC) inference methods are valid when using the MSE-optimal bandwidth, but we show they yield suboptimal confidence intervals in terms of coverage error. We establish valid coverage error expansions for RBC confidence interval estimators and use these results to propose new inference-optimal bandwidth choices for forming these intervals. We find that the standard MSE-optimal bandwidth for the RD point estimator is too large when the goal is to construct RBC confidence intervals with the smallest coverage error. We further optimize the constant terms behind the coverage error to derive new optimal choices for the auxiliary bandwidth required for RBC inference. Our expansions also establish that RBC inference yields higher-order refinements (relative to traditional undersmoothing) in the context of RD designs. Our main results cover sharp and sharp kink RD designs under conditional heteroskedasticity, and we discuss extensions to fuzzy and other RD designs, clustered sampling, and pre-intervention covariates adjustments. The theoretical findings are illustrated with a Monte Carlo experiment and an empirical application, and the main methodological results are available in \texttt{R} and \texttt{Stata} packages.
△ Less
Submitted 2 January, 2020; v1 submitted 1 September, 2018;
originally announced September 2018.
-
Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs
Authors:
Matias D. Cattaneo,
Luke Keele,
Rocio Titiunik,
Gonzalo Vazquez-Bare
Abstract:
In non-experimental settings, the Regression Discontinuity (RD) design is one of the most credible identification strategies for program evaluation and causal inference. However, RD treatment effect estimands are necessarily local, making statistical methods for the extrapolation of these effects a key area for development. We introduce a new method for extrapolation of RD effects that relies on t…
▽ More
In non-experimental settings, the Regression Discontinuity (RD) design is one of the most credible identification strategies for program evaluation and causal inference. However, RD treatment effect estimands are necessarily local, making statistical methods for the extrapolation of these effects a key area for development. We introduce a new method for extrapolation of RD effects that relies on the presence of multiple cutoffs, and is therefore design-based. Our approach employs an easy-to-interpret identifying assumption that mimics the idea of "common trends" in difference-in-differences designs. We illustrate our methods with data on a subsidized loan program on post-education attendance in Colombia, and offer new evidence on program effects for students with test scores away from the cutoff that determined program eligibility.
△ Less
Submitted 1 April, 2020; v1 submitted 13 August, 2018;
originally announced August 2018.
-
Two-Step Estimation and Inference with Possibly Many Included Covariates
Authors:
Matias D. Cattaneo,
Michael Jansson,
Xinwei Ma
Abstract:
We study the implications of including many covariates in a first-step estimate entering a two-step estimation procedure. We find that a first order bias emerges when the number of \textit{included} covariates is "large" relative to the square-root of sample size, rendering standard inference procedures invalid. We show that the jackknife is able to estimate this "many covariates" bias consistentl…
▽ More
We study the implications of including many covariates in a first-step estimate entering a two-step estimation procedure. We find that a first order bias emerges when the number of \textit{included} covariates is "large" relative to the square-root of sample size, rendering standard inference procedures invalid. We show that the jackknife is able to estimate this "many covariates" bias consistently, thereby delivering a new automatic bias-corrected two-step point estimator. The jackknife also consistently estimates the standard error of the original two-step point estimator. For inference, we develop a valid post-bias-correction bootstrap approximation that accounts for the additional variability introduced by the jackknife bias-correction. We find that the jackknife bias-corrected point estimator and the bootstrap post-bias-correction inference perform excellent in simulations, offering important improvements over conventional two-step point estimators and inference procedures, which are not robust to including many covariates. We apply our results to an array of distinct treatment effect, policy evaluation, and other applied microeconomics settings. In particular, we discuss production function and marginal treatment effect estimation in detail.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
A Random Attention Model
Authors:
Matias D. Cattaneo,
Xinwei Ma,
Yusufcan Masatlioglu,
Elchin Suleymanov
Abstract:
This paper illustrates how one can deduce preference from observed choices when attention is not only limited but also random. In contrast to earlier approaches, we introduce a Random Attention Model (RAM) where we abstain from any particular attention formation, and instead consider a large class of nonparametric random attention rules. Our model imposes one intuitive condition, termed Monotonic…
▽ More
This paper illustrates how one can deduce preference from observed choices when attention is not only limited but also random. In contrast to earlier approaches, we introduce a Random Attention Model (RAM) where we abstain from any particular attention formation, and instead consider a large class of nonparametric random attention rules. Our model imposes one intuitive condition, termed Monotonic Attention, which captures the idea that each consideration set competes for the decision-maker's attention. We then develop revealed preference theory within RAM and obtain precise testable implications for observable choice probabilities. Based on these theoretical findings, we propose econometric methods for identification, estimation, and inference of the decision maker's preferences. To illustrate the applicability of our results and their concrete empirical content in specific settings, we also develop revealed preference theory and accompanying econometric methods under additional nonparametric assumptions on the consideration set for binary choice problems. Finally, we provide general purpose software implementation of our estimation and inference results, and showcase their performance using simulations.
△ Less
Submitted 29 August, 2019; v1 submitted 9 December, 2017;
originally announced December 2017.
-
Bootstrap-Based Inference for Cube Root Asymptotics
Authors:
Matias D. Cattaneo,
Michael Jansson,
Kenichi Nagasawa
Abstract:
This paper proposes a valid bootstrap-based distributional approximation for M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For estimators of this kind, the standard nonparametric bootstrap is inconsistent. The method proposed herein is based on the nonparametric bootstrap, but restores consistency by altering the shape of the criterion function defining the estimator whose…
▽ More
This paper proposes a valid bootstrap-based distributional approximation for M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For estimators of this kind, the standard nonparametric bootstrap is inconsistent. The method proposed herein is based on the nonparametric bootstrap, but restores consistency by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy-to-implement resampling method for inference that is conceptually distinct from other available distributional approximations. We illustrate the applicability of our results with four examples in econometrics and machine learning.
△ Less
Submitted 29 May, 2020; v1 submitted 26 April, 2017;
originally announced April 2017.
-
Inference in Linear Regression Models with Many Covariates and Heteroskedasticity
Authors:
Matias D. Cattaneo,
Michael Jansson,
Whitney K. Newey
Abstract:
The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using high-dimensional approximations, where the number of…
▽ More
The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using high-dimensional approximations, where the number of included covariates are allowed to grow as fast as the sample size. We find that all of the usual versions of Eicker-White heteroskedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional)\ heteroskedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: parametric linear models with many covariates, linear panel models with many fixed effects, and semiparametric semi-linear models with many technical regressors. Simulation evidence consistent with our theoretical results is also provided. The proposed methods are also illustrated with an empirical application.
△ Less
Submitted 16 January, 2017; v1 submitted 9 July, 2015;
originally announced July 2015.