-
Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect
Authors:
Mohammad Kaviul Anam Khan,
Olli Saarela,
Rafal Kustra
Abstract:
Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or di…
▽ More
Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by \citet{breiman2001random} and \citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from \citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).
△ Less
Submitted 28 January, 2025; v1 submitted 28 January, 2025;
originally announced January 2025.
-
Improving the interpretability of GNN predictions through conformal-based graph sparsification
Authors:
Pablo Sanchez-Martin,
Kinaan Aamir Khan,
Isabel Valera
Abstract:
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that j…
▽ More
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks. However, most GNN architectures aggregate information from all nodes and edges in a graph, regardless of their relevance to the task at hand, thus hindering the interpretability of their predictions. In contrast to prior work, in this paper we propose a GNN \emph{training} approach that jointly i) finds the most predictive subgraph by removing edges and/or nodes -- -\emph{without making assumptions about the subgraph structure} -- while ii) optimizing the performance of the graph classification task. To that end, we rely on reinforcement learning to solve the resulting bi-level optimization with a reward function based on conformal predictions to account for the current in-training uncertainty of the classifier. Our empirical results on nine different graph classification datasets show that our method competes in performance with baselines while relying on significantly sparser subgraphs, leading to more interpretable GNN-based predictions.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Re-thinking Spatial Confounding in Spatial Linear Mixed Models
Authors:
Kori Khan,
Candace Berrett
Abstract:
In the last two decades, considerable research has been devoted to a phenomenon known as spatial confounding. Spatial confounding is thought to occur when there is multicollinearity between a covariate and the random effect in a spatial regression model. This multicollinearity is considered highly problematic when the inferential goal is estimating regression coefficients and various methodologies…
▽ More
In the last two decades, considerable research has been devoted to a phenomenon known as spatial confounding. Spatial confounding is thought to occur when there is multicollinearity between a covariate and the random effect in a spatial regression model. This multicollinearity is considered highly problematic when the inferential goal is estimating regression coefficients and various methodologies have been proposed to attempt to alleviate it. Recently, it has become apparent that many of these methodologies are flawed, yet the field continues to expand. In this paper, we offer a novel perspective of synthesizing the work in the field of spatial confounding. We propose that at least two distinct phenomena are currently conflated with the term spatial confounding. We refer to these as the ``analysis model'' and the ``data generation'' types of spatial confounding. We show that these two issues can lead to contradicting conclusions about whether spatial confounding exists and whether methods to alleviate it will improve inference. Our results also illustrate that in most cases, traditional spatial linear mixed models do help to improve inference on regression coefficients. Drawing on the insights gained, we offer a path forward for research in spatial confounding.
△ Less
Submitted 21 June, 2024; v1 submitted 13 January, 2023;
originally announced January 2023.
-
A Generalized Variable Importance Metric and Estimator for Black Box Machine Learning Models
Authors:
Mohammad Kaviul Anam Khan,
Olli Saarela,
Rafal Kustra
Abstract:
In this paper we define a population parameter, ``Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods, where the importance is not represented by model-based parameter. GVIM is defined for each input variable, using the true conditional expectation function, and it measures the variable's importance in affecting a continuous o…
▽ More
In this paper we define a population parameter, ``Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods, where the importance is not represented by model-based parameter. GVIM is defined for each input variable, using the true conditional expectation function, and it measures the variable's importance in affecting a continuous or a binary response. We extend previously published results to show that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for any kind of a predictor, which gives it a causal interpretation and further justification as an alternative to classical measures of significance that are only available in simple parametric models. Extensive set of simulations using realistically complex relationships between covariates and outcomes and number of regression techniques of varying degree of complexity show the performance of our proposed estimator of the GVIM.
△ Less
Submitted 23 December, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Shining a Light on Forensic Black-Box Studies
Authors:
Kori Khan,
Alicia L. Carriquiry
Abstract:
Forensic science plays a critical role in the United States criminal justice system. For decades, many feature-based fields of forensic science, such as firearm and toolmark identification, developed outside the scientific community's purview. The results of these studies are widely relied on by judges nationwide. However, this reliance is misplaced. Black-box studies to date suffer from inappropr…
▽ More
Forensic science plays a critical role in the United States criminal justice system. For decades, many feature-based fields of forensic science, such as firearm and toolmark identification, developed outside the scientific community's purview. The results of these studies are widely relied on by judges nationwide. However, this reliance is misplaced. Black-box studies to date suffer from inappropriate sampling methods and high rates of missingness. Current black-box studies ignore both problems in arriving at the error rate estimates presented to courts. We explore the impact of each type of limitation using available data from black-box studies and court materials. We show that black-box studies rely on non-representative samples of examiners. Using a case study of a popular ballistics study, we find evidence that these unrepresentative samples may commit fewer errors than the wider population from which they came. We also find evidence that the missingness in black-box studies is non-ignorable. Using data from a recent latent print study, we show that ignoring this missingness likely results in systematic underestimates of error rates. Finally, we offer concrete steps to overcome these limitations.
△ Less
Submitted 1 June, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Computing with R-INLA: Accuracy and reproducibility with implications for the analysis of COVID-19 data
Authors:
Kori Khan,
Hengrui Luo,
Wenna Xi
Abstract:
The statistical methods used to analyze medical data are becoming increasingly complex. Novel statistical methods increasingly rely on simulation studies to assess their validity. Such assessments typically appear in statistical or computational journals, and the methodology is later introduced to the medical community through tutorials. This can be problematic if applied researchers use the metho…
▽ More
The statistical methods used to analyze medical data are becoming increasingly complex. Novel statistical methods increasingly rely on simulation studies to assess their validity. Such assessments typically appear in statistical or computational journals, and the methodology is later introduced to the medical community through tutorials. This can be problematic if applied researchers use the methodologies in settings that have not been evaluated. In this paper, we explore a case study of one such method that has become popular in the analysis of coronavirus disease 2019 (COVID-19) data. The integrated nested Laplace approximations (INLA), as implemented in the R-INLA package, approximates the marginal posterior distributions of target parameters that would have been obtained from a fully Bayesian analysis. We seek to answer an important question: Does existing research on the accuracy of INLA's approximations support how researchers are currently using it to analyze COVID-19 data? We identify three limitations to work assessing INLA's accuracy: 1) inconsistent definitions of accuracy, 2) a lack of studies validating how researchers are actually using INLA, and 3) a lack of research into the reproducibility of INLA's output. We explore the practical impact of each limitation with simulation studies based on models and data used in COVID-19 research. Our results suggest existing methods of assessing the accuracy of the INLA technique may not support how COVID-19 researchers are using it. Guided in part by our results, we offer a proposed set of minimum guidelines for researchers using statistical methodologies primarily validated through simulation studies.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Restricted Spatial Regression Methods: Implications for Inference
Authors:
Kori Khan,
Catherine A. Calder
Abstract:
The issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature. Multiple authors have offered perspectives and potential solutions. In this paper, for the areal spatial data setting, we show that many of the methods designed to alleviate spatial confounding can be viewed as special cases…
▽ More
The issue of spatial confounding between the spatial random effect and the fixed effects in regression analyses has been identified as a concern in the statistical literature. Multiple authors have offered perspectives and potential solutions. In this paper, for the areal spatial data setting, we show that many of the methods designed to alleviate spatial confounding can be viewed as special cases of a general class of models. We refer to this class as Restricted Spatial Regression (RSR) models, extending terminology currently in use. We offer a mathematically based exploration of the impact that RSR methods have on inference for regression coefficients for the linear model. We then explore whether these results hold in the generalized linear model setting for count data using simulations. We show that the use of these methods have counterintuitive consequences which defy the general expectations in the literature. In particular, our results and the accompanying simulations suggest that RSR methods will typically perform worse than non-spatial methods. These results have important implications for dimension reduction strategies in spatial regression modeling. Specifically, we demonstrate that the problems with RSR models cannot be fixed with a selection of "better" spatial basis vectors or dimension reduction techniques.
△ Less
Submitted 18 August, 2020; v1 submitted 22 May, 2019;
originally announced May 2019.