-
The harms of class imbalance corrections for machine learning based prediction models: a simulation study
Authors:
Alex Carriero,
Kim Luijken,
Anne de Hond,
Karel GM Moons,
Ben van Calster,
Maarten van Smeden
Abstract:
Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in…
▽ More
Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in the data). It is common for researchers to correct this class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown. We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors and event fraction). Our findings were illustrated in a case study using MIMIC-III data. In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration. Correcting for class imbalance is not always necessary and may even be harmful for clinical prediction models which aim to produce reliable risk estimates on an individual basis.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
The risks of risk assessment: causal blind spots when using prediction models for treatment decisions
Authors:
Nan van Geloven,
Ruth H Keogh,
Wouter van Amsterdam,
Giovanni Cinà,
Jesse H. Krijthe,
Niels Peek,
Kim Luijken,
Sara Magliacane,
Paweł Morzywołek,
Thijs van Ommen,
Hein Putter,
Matthew Sperrin,
Junfeng Wang,
Daniala L. Weir,
Vanessa Didelez
Abstract:
Clinicians increasingly rely on prediction models to guide treatment choices. Most prediction models, however, are developed using observational data that include some patients who have already received the treatment the prediction model is meant to inform. Special attention to the causal role of those earlier treatments is required when interpreting the resulting predictions. We identify 'causal…
▽ More
Clinicians increasingly rely on prediction models to guide treatment choices. Most prediction models, however, are developed using observational data that include some patients who have already received the treatment the prediction model is meant to inform. Special attention to the causal role of those earlier treatments is required when interpreting the resulting predictions. We identify 'causal blind spots' in three common approaches to handling treatment when developing a prediction model: including treatment as a predictor, restricting to individuals taking a certain treatment, and ignoring treatment. Through several real examples, we illustrate how the risks obtained from models developed using such approaches may be misinterpreted and can lead to misinformed decision-making. Our discussion covers issues attributable to confounding, selection, mediation and changes in treatment protocols over time. We advocate for an extension of guidelines for the development, reporting and evaluation of prediction models to avoid such misinterpretations. Developers must ensure that the intended target population for the model, and the treatment conditions under which predictions hold, are clearly communicated. When prediction models are intended to inform treatment decisions, they need to provide estimates of risk under the specific treatment (or intervention) options being considered, known as 'prediction under interventions'. Next to suitable data, this requires causal reasoning and causal inference techniques during model development and evaluation. Being clear about what a given prediction model can and cannot be used for prevents misinformed treatment decisions and thereby prevents potential harm to patients.
△ Less
Submitted 16 June, 2025; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Risk-based decision making: estimands for sequential prediction under interventions
Authors:
Kim Luijken,
Paweł Morzywołek,
Wouter van Amsterdam,
Giovanni Cinà,
Jeroen Hoogland,
Ruth Keogh,
Jesse Krijthe,
Sara Magliacane,
Thijs van Ommen,
Niels Peek,
Hein Putter,
Maarten van Smeden,
Matthew Sperrin,
Junfeng Wang,
Daniala Weir,
Vanessa Didelez,
Nan van Geloven
Abstract:
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk beca…
▽ More
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred and re-evaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Replicability of Simulation Studies for the Investigation of Statistical Methods: The RepliSims Project
Authors:
K. Luijken,
A. Lohmann,
U. Alter,
J. Claramunt Gonzalez,
F. J. Clouth,
J. L. Fossum,
L. Hesen,
A. H. J. Huizing,
J. Ketelaar,
A. K. Montoya,
L. Nab,
R. C. C. Nijman,
B. B. L. Penning de Vries,
T. D. Tibbe,
Y. A. Wang,
R. H. H. Groenwold
Abstract:
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their re…
▽ More
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their replicability was assessed by teams of replicators with formal training in quantitative methodology. The teams found relevant information in the original publications and used it to write simulation code with the aim of replicating the results. The primary outcome was the feasibility of replicability based on reported information in the original publications. Replicability varied greatly: Some original studies provided detailed information leading to almost perfect replication of results, whereas other studies did not provide enough information to implement any of the reported simulations. Replicators had to make choices regarding missing or ambiguous information in the original studies, error handling, and software environment. Factors facilitating replication included public availability of code, and descriptions of the data-generating procedure and methods in graphs, formulas, structured text, and publicly accessible additional resources such as technical reports. Replicability of statistical simulation studies was mainly impeded by lack of information and sustainability of information sources. Reproducibility could be achieved for simulation studies by providing open code and data as a supplement to the publication. Additionally, simulation studies should be transparently reported with all relevant information either in the research paper itself or in easily accessible supplementary material to allow for replicability.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Prediction meets causal inference: the role of treatment in clinical prediction models
Authors:
Nan van Geloven,
Sonja Swanson,
Chava Ramspek,
Kim Luijken,
Merel van Diepen,
Tim Morris,
Rolf Groenwold,
Hans van Houwelingen,
Hein Putter,
Saskia le Cessie
Abstract:
In this paper we study approaches for dealing with treatment when developing a clinical prediction model. Analogous to the estimand framework recently proposed by the European Medicines Agency for clinical trials, we propose a `predictimand' framework of different questions that may be of interest when predicting risk in relation to treatment started after baseline. We provide a formal definition…
▽ More
In this paper we study approaches for dealing with treatment when developing a clinical prediction model. Analogous to the estimand framework recently proposed by the European Medicines Agency for clinical trials, we propose a `predictimand' framework of different questions that may be of interest when predicting risk in relation to treatment started after baseline. We provide a formal definition of the estimands matching these questions, give examples of settings in which each is useful and discuss appropriate estimators including their assumptions. We illustrate the impact of the predictimand choice in a dataset of patients with end-stage kidney disease. We argue that clearly defining the estimand is equally important in prediction research as in causal inference.
△ Less
Submitted 15 April, 2020;
originally announced April 2020.
-
Impact of predictor measurement heterogeneity across settings on performance of prediction models: a measurement error perspective
Authors:
Kim Luijken,
Rolf H. H. Groenwold,
Ben van Calster,
Ewout W. Steyerberg,
Maarten van Smeden
Abstract:
It is widely acknowledged that the predictive performance of clinical prediction models should be studied in patients that were not part of the data in which the model was derived. Out-of-sample performance can be hampered when predictors are measured differently at derivation and external validation. This may occur, for instance, when predictors are measured using different measurement protocols…
▽ More
It is widely acknowledged that the predictive performance of clinical prediction models should be studied in patients that were not part of the data in which the model was derived. Out-of-sample performance can be hampered when predictors are measured differently at derivation and external validation. This may occur, for instance, when predictors are measured using different measurement protocols or when tests are produced by different manufacturers. Although such heterogeneity in predictor measurement between deriviation and validation data is common, the impact on the out-of-sample performance is not well studied. Using analytical and simulation approaches, we examined out-of-sample performance of prediction models under various scenarios of heterogeneous predictor measurement. These scenarios were defined and clarified using an established taxonomy of measurement error models. The results of our simulations indicate that predictor measurement heterogeneity can induce miscalibration of prediction and affects discrimination and overall predictive accuracy, to extents that the prediction model may no longer be considered clinically useful. The measurement error taxonomy was found to be helpful in identifying and predicting effects of heterogeneous predictor measurements between settings of prediction model derivation and validation. Our work indicates that homogeneity of measurement strategies across settings is of paramount importance in prediction research.
△ Less
Submitted 5 February, 2019; v1 submitted 27 June, 2018;
originally announced June 2018.