-
Bayesian Causal Forests for Longitudinal Data: Assessing the Impact of Part-Time Work on Growth in High School Mathematics Achievement
Authors:
Nathan McJames,
Ann O'Shea,
Andrew Parnell
Abstract:
Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian non-parametric methods have recently become popular…
▽ More
Modelling growth in student achievement is a significant challenge in the field of education. Understanding how interventions or experiences such as part-time work can influence this growth is also important. Traditional methods like difference-in-differences are effective for estimating causal effects from longitudinal data. Meanwhile, Bayesian non-parametric methods have recently become popular for estimating causal effects from single time point observational studies. However, there remains a scarcity of methods capable of combining the strengths of these two approaches to flexibly estimate heterogeneous causal effects from longitudinal data. Motivated by two waves of data from the High School Longitudinal Study, the NCES' most recent longitudinal study which tracks a representative sample of over 20,000 students in the US, our study introduces a longitudinal extension of Bayesian Causal Forests. This model allows for the flexible identification of both individual growth in mathematical ability and the effects of participation in part-time work. Simulation studies demonstrate the predictive performance and reliable uncertainty quantification of the proposed model. Results reveal the negative impact of part time work for most students, but hint at potential benefits for those students with an initially low sense of school belonging. Clear signs of a widening achievement gap between students with high and low academic achievement are also identified. Potential policy implications are discussed, along with promising areas for future research.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Bayesian Causal Forests for Multivariate Outcomes: Application to Irish Data From an International Large Scale Education Assessment
Authors:
Nathan McJames,
Andrew Parnell,
Yong Chen Goh,
Ann O'Shea
Abstract:
Bayesian Causal Forests (BCF) is a causal inference machine learning model based on a highly flexible non-parametric regression and classification tool called Bayesian Additive Regression Trees (BART). Motivated by data from the Trends in International Mathematics and Science Study (TIMSS), which includes data on student achievement in both mathematics and science, we present a multivariate extens…
▽ More
Bayesian Causal Forests (BCF) is a causal inference machine learning model based on a highly flexible non-parametric regression and classification tool called Bayesian Additive Regression Trees (BART). Motivated by data from the Trends in International Mathematics and Science Study (TIMSS), which includes data on student achievement in both mathematics and science, we present a multivariate extension of the BCF algorithm. With the help of simulation studies we show that our approach can accurately estimate causal effects for multiple outcomes subject to the same treatment. We also apply our model to Irish data from TIMSS 2019. Our findings reveal the positive effects of having access to a study desk at home (Mathematics ATE 95% CI: [0.20, 11.67]) while also highlighting the negative consequences of students often feeling hungry at school (Mathematics ATE 95% CI: [-11.15, -2.78] , Science ATE 95% CI: [-10.82,-1.72]) or often being absent (Mathematics ATE 95% CI: [-12.47, -1.55]).
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
A Supervised Learning Approach to Rankability
Authors:
Nathan McJames,
David Malone,
Oliver Mason
Abstract:
The rankability of data is a recently proposed problem that considers the ability of a dataset, represented as a graph, to produce a meaningful ranking of the items it contains. To study this concept, a number of rankability measures have recently been proposed, based on comparisons to a complete dominance graph via combinatorial and linear algebraic methods. In this paper, we review these measure…
▽ More
The rankability of data is a recently proposed problem that considers the ability of a dataset, represented as a graph, to produce a meaningful ranking of the items it contains. To study this concept, a number of rankability measures have recently been proposed, based on comparisons to a complete dominance graph via combinatorial and linear algebraic methods. In this paper, we review these measures and highlight some questions to which they give rise before going on to propose new methods to assess rankability, which are amenable to efficient estimation. Finally, we compare these measures by applying them to both synthetic and real-life sports data.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Accounting for shared covariates in semi-parametric Bayesian additive regression trees
Authors:
Estevão B. Prado,
Andrew C. Parnell,
Keefe Murphy,
Nathan McJames,
Ann O'Shea,
Rafael A. Moral
Abstract:
We propose some extensions to semi-parametric models based on Bayesian additive regression trees (BART). In the semi-parametric BART paradigm, the response variable is approximated by a linear predictor and a BART model, where the linear component is responsible for estimating the main effects and BART accounts for non-specified interactions and non-linearities. Previous semi-parametric models bas…
▽ More
We propose some extensions to semi-parametric models based on Bayesian additive regression trees (BART). In the semi-parametric BART paradigm, the response variable is approximated by a linear predictor and a BART model, where the linear component is responsible for estimating the main effects and BART accounts for non-specified interactions and non-linearities. Previous semi-parametric models based on BART have assumed that the set of covariates in the linear predictor and the BART model are mutually exclusive in an attempt to avoid poor coverage properties and reduce bias in the estimates of the parameters in the linear predictor. The main novelty in our approach lies in the way we change the tree-generation moves in BART to deal with this bias and resolve non-identifiability issues between the parametric and non-parametric components, even when they have covariates in common. This allows us to model complex interactions involving the covariates of primary interest, both among themselves and with those in the BART component. Our novel method is developed with a view to analysing data from an international education assessment, where certain predictors of students' achievements in mathematics are of particular interpretational interest. Through additional simulation studies and another application to a well-known benchmark dataset, we also show competitive performance when compared to regression models, alternative formulations of semi-parametric BART, and other tree-based methods. The implementation of the proposed method is available at \url{https://github.com/ebprado/CSP-BART}.
△ Less
Submitted 30 July, 2024; v1 submitted 17 August, 2021;
originally announced August 2021.