-
From Average Effects to Targeted Assignment: A Causal Machine Learning Analysis of Swiss Active Labor Market Policies
Authors:
Federica Mascolo,
Nora Bearth,
Fabian Muny,
Michael Lechner,
Jana Mareckova
Abstract:
Active labor market policies are widely used by the Swiss government, enrolling over half of all unemployed individuals. This paper evaluates the effectiveness of Swiss programs in improving employment and earnings outcomes using causal machine learning and rich administrative data on unemployed individuals in 2014 and 2015, including detailed labor market histories and other covariates. The findi…
▽ More
Active labor market policies are widely used by the Swiss government, enrolling over half of all unemployed individuals. This paper evaluates the effectiveness of Swiss programs in improving employment and earnings outcomes using causal machine learning and rich administrative data on unemployed individuals in 2014 and 2015, including detailed labor market histories and other covariates. The findings for Swiss citizens and immigrants with permanent residency indicate a small positive average effect of a Temporary Wage Subsidy program on employment and earnings in the third year after program start. In contrast, Basic Courses, such as job application training, exhibit negative effects on both outcomes over the same period. No significant impacts are found for Employment Programs conducted outside the regular labor market or for Training Courses such as language or computer classes. The programs are most effective for individuals with a non-EU migration background, while Temporary Wage Subsidies also benefit those with lower educational attainment. Finally, shallow policy trees provide practical guidance for improving the targeting of program assignments.
△ Less
Submitted 11 May, 2025; v1 submitted 30 October, 2024;
originally announced October 2024.
-
Enabling Decision-Making with the Modified Causal Forest: Policy Trees for Treatment Assignment
Authors:
Hugo Bodory,
Federica Mascolo,
Michael Lechner
Abstract:
Decision-making plays a pivotal role in shaping outcomes in various disciplines, such as medicine, economics, and business. This paper provides guidance to practitioners on how to implement a decision tree designed to address treatment assignment policies using an interpretable and non-parametric algorithm. Our Policy Tree is motivated on the method proposed by Zhou, Athey, and Wager (2023), disti…
▽ More
Decision-making plays a pivotal role in shaping outcomes in various disciplines, such as medicine, economics, and business. This paper provides guidance to practitioners on how to implement a decision tree designed to address treatment assignment policies using an interpretable and non-parametric algorithm. Our Policy Tree is motivated on the method proposed by Zhou, Athey, and Wager (2023), distinguishing itself for the policy score calculation, incorporating constraints, and handling categorical and continuous variables. We demonstrate the usage of the Policy Tree for multiple, discrete treatments on data sets from different fields. The Policy Tree is available in Python's open-source package mcf (Modified Causal Forest).
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Comprehensive Causal Machine Learning
Authors:
Michael Lechner,
Jana Mareckova
Abstract:
Uncovering causal effects in multiple treatment setting at various levels of granularity provides substantial value to decision makers. Comprehensive machine learning approaches to causal effect estimation allow to use a single causal machine learning approach for estimation and inference of causal mean effects for all levels of granularity. Focusing on selection-on-observables, this paper compare…
▽ More
Uncovering causal effects in multiple treatment setting at various levels of granularity provides substantial value to decision makers. Comprehensive machine learning approaches to causal effect estimation allow to use a single causal machine learning approach for estimation and inference of causal mean effects for all levels of granularity. Focusing on selection-on-observables, this paper compares three such approaches, the modified causal forest (mcf), the generalized random forest (grf), and double machine learning (dml). It also compares the theoretical properties of the approaches and provides proven theoretical guarantees for the mcf. The findings indicate that dml-based methods excel for average treatment effects at the population level (ATE) and group level (GATE) with few groups, when selection into treatment is not too strong. However, for finer causal heterogeneity, explicitly outcome-centred forest-based approaches are superior. The mcf has three additional benefits: (i) It is the most robust estimator in cases when dml-based approaches underperform because of substantial selection into treatment; (ii) it is the best estimator for GATEs when the number of groups gets larger; and (iii), it is the only estimator that is internally consistent, in the sense that low-dimensional causal ATEs and GATEs are obtained as aggregates of finer-grained causal parameters.
△ Less
Submitted 13 February, 2025; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Teamwork and Spillover Effects in Performance Evaluations
Authors:
Enzo Brox,
Michael Lechner
Abstract:
This article shows how coworker performance affects individual performance evaluation in a teamwork setting at the workplace. We use high-quality data on football matches to measure an important component of individual performance, shooting performance, isolated from collaborative effects. Employing causal machine learning methods, we address the assortative matching of workers and estimate both a…
▽ More
This article shows how coworker performance affects individual performance evaluation in a teamwork setting at the workplace. We use high-quality data on football matches to measure an important component of individual performance, shooting performance, isolated from collaborative effects. Employing causal machine learning methods, we address the assortative matching of workers and estimate both average and heterogeneous effects. There is substantial evidence for spillover effects in performance evaluations. Coworker shooting performance, meaningfully impacts both, manager decisions and third-party expert evaluations of individual performance. Our results underscore the significant role coworkers play in shaping career advancements and highlight a complementary channel, to productivity gains and learning effects, how coworkers impact career advancement. We characterize the groups of workers that are most and least affected by spillover effects and show that spillover effects are reference point dependent. While positive deviations from a reference point create positive spillover effects, negative deviations are not harmful for coworkers.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Causal Machine Learning for Moderation Effects
Authors:
Nora Bearth,
Michael Lechner
Abstract:
It is valuable for any decision maker to know the impact of decisions (treatments) on average and for subgroups. The causal machine learning literature has recently provided tools for estimating group average treatment effects (GATE) to better describe treatment heterogeneity. This paper addresses the challenge of interpreting such differences in treatment effects between groups while accounting f…
▽ More
It is valuable for any decision maker to know the impact of decisions (treatments) on average and for subgroups. The causal machine learning literature has recently provided tools for estimating group average treatment effects (GATE) to better describe treatment heterogeneity. This paper addresses the challenge of interpreting such differences in treatment effects between groups while accounting for variations in other covariates. We propose a new parameter, the balanced group average treatment effect (BGATE), which measures a GATE with a specific distribution of a priori-determined covariates. By taking the difference between two BGATEs, we can analyze heterogeneity more meaningfully than by comparing two GATEs, as we can separate the difference due to the different distributions of other variables and the difference due to the variable of interest. The main estimation strategy for this parameter is based on double/debiased machine learning for discrete treatments in an unconfoundedness setting, and the estimator is shown to be $\sqrt{N}$-consistent and asymptotically normal under standard conditions. We propose two additional estimation strategies: automatic debiased machine learning and a specific reweighting procedure. Last, we demonstrate the usefulness of these parameters in a small-scale simulation study and in an empirical example.
△ Less
Submitted 9 January, 2025; v1 submitted 16 January, 2024;
originally announced January 2024.
-
The finite sample performance of instrumental variable-based estimators of the Local Average Treatment Effect when controlling for covariates
Authors:
Hugo Bodory,
Martin Huber,
Michael Lechner
Abstract:
This paper investigates the finite sample performance of a range of parametric, semi-parametric, and non-parametric instrumental variable estimators when controlling for a fixed set of covariates to evaluate the local average treatment effect. Our simulation designs are based on empirical labor market data from the US and vary in several dimensions, including effect heterogeneity, instrument selec…
▽ More
This paper investigates the finite sample performance of a range of parametric, semi-parametric, and non-parametric instrumental variable estimators when controlling for a fixed set of covariates to evaluate the local average treatment effect. Our simulation designs are based on empirical labor market data from the US and vary in several dimensions, including effect heterogeneity, instrument selectivity, instrument strength, outcome distribution, and sample size. Among the estimators and simulations considered, non-parametric estimation based on the random forest (a machine learner controlling for covariates in a data-driven way) performs competitive in terms of the average coverage rates of the (bootstrap-based) 95% confidence intervals, while also being relatively precise. Non-parametric kernel regression as well as certain versions of semi-parametric radius matching on the propensity score, pair matching on the covariates, and inverse probability weighting also have a decent coverage, but are less precise than the random forest-based method. In terms of the average root mean squared error of LATE estimation, kernel regression performs best, closely followed by the random forest method, which has the lowest average absolute bias.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Modified Causal Forest
Authors:
Michael Lechner,
Jana Mareckova
Abstract:
Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops estimation and inference procedures for multiple treatment models in a selection-on-observed-variables framework by modifying the Causal Forest approach (Wager and Athey, 2018) in several dimensions. The new estimators…
▽ More
Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops estimation and inference procedures for multiple treatment models in a selection-on-observed-variables framework by modifying the Causal Forest approach (Wager and Athey, 2018) in several dimensions. The new estimators have desirable theoretical, computational, and practical properties for various aggregation levels of the causal effects. While an Empirical Monte Carlo study suggests that they outperform previously suggested estimators, an application to the evaluation of an active labour market pro-gramme shows their value for applied research.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Active labour market policies for the long-term unemployed: New evidence from causal machine learning
Authors:
Daniel Goller,
Tamara Harrer,
Michael Lechner,
Joachim Wolff
Abstract:
Active labor market programs are important instruments used by European employment agencies to help the unemployed find work. Investigating large administrative data on German long-term unemployed persons, we analyze the effectiveness of three job search assistance and training programs using Causal Machine Learning. Participants benefit from quickly realizing and long-lasting positive effects acr…
▽ More
Active labor market programs are important instruments used by European employment agencies to help the unemployed find work. Investigating large administrative data on German long-term unemployed persons, we analyze the effectiveness of three job search assistance and training programs using Causal Machine Learning. Participants benefit from quickly realizing and long-lasting positive effects across all programs, with placement services being the most effective. For women, we find differential effects in various characteristics. Especially, women benefit from better local labor market conditions. We propose more effective data-driven rules for allocating the unemployed to the respective labor market programs that could be employed by decision-makers.
△ Less
Submitted 29 May, 2023; v1 submitted 18 June, 2021;
originally announced June 2021.
-
The Effect of Sport in Online Dating: Evidence from Causal Machine Learning
Authors:
Daniel Boller,
Michael Lechner,
Gabriel Okasa
Abstract:
Online dating emerged as a key platform for human mating. Previous research focused on socio-demographic characteristics to explain human mating in online dating environments, neglecting the commonly recognized relevance of sport. This research investigates the effect of sport activity on human mating by exploiting a unique data set from an online dating platform. Thereby, we leverage recent advan…
▽ More
Online dating emerged as a key platform for human mating. Previous research focused on socio-demographic characteristics to explain human mating in online dating environments, neglecting the commonly recognized relevance of sport. This research investigates the effect of sport activity on human mating by exploiting a unique data set from an online dating platform. Thereby, we leverage recent advances in the causal machine learning literature to estimate the causal effect of sport frequency on the contact chances. We find that for male users, doing sport on a weekly basis increases the probability to receive a first message from a woman by 50%, relatively to not doing sport at all. For female users, we do not find evidence for such an effect. In addition, for male users the effect increases with higher income.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium
Authors:
Bart Cockx,
Michael Lechner,
Joost Bollens
Abstract:
Based on administrative data of unemployed in Belgium, we estimate the labour market effects of three training programmes at various aggregation levels using Modified Causal Forests, a causal machine learning estimator. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and unemployed. Simulations show that 'black-box' rules tha…
▽ More
Based on administrative data of unemployed in Belgium, we estimate the labour market effects of three training programmes at various aggregation levels using Modified Causal Forests, a causal machine learning estimator. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and unemployed. Simulations show that 'black-box' rules that reassign unemployed to programmes that maximise estimated individual gains can considerably improve effectiveness: up to 20 percent more (less) time spent in (un)employment within a 30 months window. A shallow policy tree delivers a simple rule that realizes about 70 percent of this gain.
△ Less
Submitted 17 December, 2022; v1 submitted 30 December, 2019;
originally announced December 2019.
-
Sorting on the Used-Car Market After the Volkswagen Emission Scandal
Authors:
Anthony Strittmatter,
Michael Lechner
Abstract:
The disclosure of the VW emission manipulation scandal caused a quasi-experimental market shock to the observable environmental quality of VW diesel vehicles. To investigate the market reaction to this shock, we collect data from a used-car online advertisement platform. We find that the supply of used VW diesel vehicles increases after the VW emission scandal. The positive supply side effects inc…
▽ More
The disclosure of the VW emission manipulation scandal caused a quasi-experimental market shock to the observable environmental quality of VW diesel vehicles. To investigate the market reaction to this shock, we collect data from a used-car online advertisement platform. We find that the supply of used VW diesel vehicles increases after the VW emission scandal. The positive supply side effects increase with the probability of manipulation. Furthermore, we find negative impacts on the asking prices of used cars subject to a high probability of manipulation. We rationalize these findings with a model for sorting by the environmental quality of used cars.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
Nonparametric estimation of causal heterogeneity under high-dimensional confounding
Authors:
Michael Zimmert,
Michael Lechner
Abstract:
This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a selection-on-observables framework where the number of possible confounders is very large. We propose a two-step estimator for which the first step is estimated by machine learning. We show that this e…
▽ More
This paper considers the practically important case of nonparametrically estimating heterogeneous average treatment effects that vary with a limited number of discrete and continuous covariates in a selection-on-observables framework where the number of possible confounders is very large. We propose a two-step estimator for which the first step is estimated by machine learning. We show that this estimator has desirable statistical properties like consistency, asymptotic normality and rate double robustness. In particular, we derive the coupled convergence conditions between the nonparametric and the machine learning steps. We also show that estimating population average treatment effects by averaging the estimated heterogeneous effects is semi-parametrically efficient. The new estimator is an empirical example of the effects of mothers' smoking during pregnancy on the resulting birth weight.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Random Forest Estimation of the Ordered Choice Model
Authors:
Michael Lechner,
Gabriel Okasa
Abstract:
In this paper we develop a new machine learning estimator for ordered choice models based on the random forest. The proposed Ordered Forest flexibly estimates the conditional choice probabilities while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference and thus pro…
▽ More
In this paper we develop a new machine learning estimator for ordered choice models based on the random forest. The proposed Ordered Forest flexibly estimates the conditional choice probabilities while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference and thus provides the same output as classical econometric estimators. An extensive simulation study reveals a good predictive performance, particularly in settings with non-linearities and near-multicollinearity. An empirical application contrasts the estimation of marginal effects and their standard errors with an ordered logit model. A software implementation of the Ordered Forest is provided both in R and Python in the package orf available on CRAN and PyPI, respectively.
△ Less
Submitted 8 September, 2022; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Modified Causal Forests for Estimating Heterogeneous Causal Effects
Authors:
Michael Lechner
Abstract:
Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops new estimation and inference procedures for multiple treatment models in a selection-on-observables framework by modifying the Causal Forest approach suggested by Wager and Athey (2018) in several dimensions. The new e…
▽ More
Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops new estimation and inference procedures for multiple treatment models in a selection-on-observables framework by modifying the Causal Forest approach suggested by Wager and Athey (2018) in several dimensions. The new estimators have desirable theoretical, computational and practical properties for various aggregation levels of the causal effects. While an Empirical Monte Carlo study suggests that they outperform previously suggested estimators, an application to the evaluation of an active labour market programme shows the value of the new methods for applied research.
△ Less
Submitted 5 July, 2019; v1 submitted 22 December, 2018;
originally announced December 2018.
-
Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence
Authors:
Michael C. Knaus,
Michael Lechner,
Anthony Strittmatter
Abstract:
We investigate the finite sample performance of causal machine learning estimators for heterogeneous causal effects at different aggregation levels. We employ an Empirical Monte Carlo Study that relies on arguably realistic data generation processes (DGPs) based on actual data. We consider 24 different DGPs, eleven different causal machine learning estimators, and three aggregation levels of the e…
▽ More
We investigate the finite sample performance of causal machine learning estimators for heterogeneous causal effects at different aggregation levels. We employ an Empirical Monte Carlo Study that relies on arguably realistic data generation processes (DGPs) based on actual data. We consider 24 different DGPs, eleven different causal machine learning estimators, and three aggregation levels of the estimated effects. In the main DGPs, we allow for selection into treatment based on a rich set of observable covariates. We provide evidence that the estimators can be categorized into three groups. The first group performs consistently well across all DGPs and aggregation levels. These estimators have multiple steps to account for the selection into the treatment and the outcome process. The second group shows competitive performance only for particular DGPs. The third group is clearly outperformed by the other estimators.
△ Less
Submitted 17 December, 2018; v1 submitted 31 October, 2018;
originally announced October 2018.
-
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach
Authors:
Michael Knaus,
Michael Lechner,
Anthony Strittmatter
Abstract:
We systematically investigate the effect heterogeneity of job search programmes for unemployed workers. To investigate possibly heterogeneous employment effects, we combine non-experimental causal empirical models with Lasso-type estimators. The empirical analyses are based on rich administrative data from Swiss social security records. We find considerable heterogeneities only during the first si…
▽ More
We systematically investigate the effect heterogeneity of job search programmes for unemployed workers. To investigate possibly heterogeneous employment effects, we combine non-experimental causal empirical models with Lasso-type estimators. The empirical analyses are based on rich administrative data from Swiss social security records. We find considerable heterogeneities only during the first six months after the start of training. Consistent with previous results of the literature, unemployed persons with fewer employment opportunities profit more from participating in these programmes. Furthermore, we also document heterogeneous employment effects by residence status. Finally, we show the potential of easy-to-implement programme participation rules for improving average employment effects of these active labour market programmes.
△ Less
Submitted 11 May, 2018; v1 submitted 29 September, 2017;
originally announced September 2017.