Holdouts set for safe predictive model updating

Haidar-Wehbe, Sami; Emerson, Samuel R; Aslett, Louis J M; Liley, James

Statistics > Machine Learning

arXiv:2202.06374 (stat)

[Submitted on 13 Feb 2022 (v1), last revised 19 Dec 2024 (this version, v5)]

Title:Holdouts set for safe predictive model updating

Authors:Sami Haidar-Wehbe, Samuel R Emerson, Louis J M Aslett, James Liley

View PDF HTML (experimental)

Abstract:Predictive risk scores for adverse outcomes are increasingly crucial in guiding health interventions. Such scores may need to be periodically updated due to change in the distributions they model. However, directly updating risk scores used to guide intervention can lead to biased risk estimates. To address this, we propose updating using a `holdout set' - a subset of the population that does not receive interventions guided by the risk score. Balancing the holdout set size is essential to ensure good performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach reduces adverse outcome frequency to an asymptotically optimal level and argue that often there is no competitive alternative. We describe conditions under which an optimal holdout size (OHS) can be readily identified, and introduce parametric and semi-parametric algorithms for OHS estimation. We apply our methods to the ASPRE risk score for pre-eclampsia to recommend a plan for updating it in the presence of change in the underlying data distribution. We show that, in order to minimise the number of pre-eclampsia cases over time, this is best achieved using a holdout set of around 10,000 individuals.

Comments:	Manuscript includes supplementary materials and figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2202.06374 [stat.ML]
	(or arXiv:2202.06374v5 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2202.06374

Submission history

From: James Liley [view email]
[v1] Sun, 13 Feb 2022 18:04:00 UTC (2,295 KB)
[v2] Thu, 17 Feb 2022 13:33:29 UTC (2,043 KB)
[v3] Fri, 8 Jul 2022 13:32:57 UTC (1,728 KB)
[v4] Mon, 31 Jul 2023 11:39:21 UTC (1,530 KB)
[v5] Thu, 19 Dec 2024 10:12:00 UTC (724 KB)

Statistics > Machine Learning

Title:Holdouts set for safe predictive model updating

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Holdouts set for safe predictive model updating

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators