-
A general sample size framework for developing or updating a clinical prediction model
Authors:
Richard D Riley,
Rebecca Whittle,
Mohsen Sadatsafavi,
Glen P. Martin,
Alexander Pate,
Gary S. Collins,
Joie Ensor
Abstract:
Aims: To propose a general sample size framework for developing or updating a clinical prediction model using any statistical or machine learning method, based on drawing samples from anticipated posterior distributions and targeting assurance in predictive performance.
Methods: Users provide a reference model (eg, matching outcome incidence, predictor weights and c-statistic of previous models)…
▽ More
Aims: To propose a general sample size framework for developing or updating a clinical prediction model using any statistical or machine learning method, based on drawing samples from anticipated posterior distributions and targeting assurance in predictive performance.
Methods: Users provide a reference model (eg, matching outcome incidence, predictor weights and c-statistic of previous models), and a (synthetic) dataset reflecting the joint distribution of candidate predictors in the target population. Then a fully simulation-based approach allows the impact of a chosen development sample size and modelling strategy to be examined. This generates thousands of models and, by applying each to the target population, leads to posterior distributions of individual predictions and model performance (degradation) metrics, to inform required sample size. To improve computation speed for penalised regression, we also propose a one-sample Bayesian analysis combining shrinkage priors with a likelihood decomposed into sample size and Fisher's information.
Results: The framework is illustrated when developing pre-eclampsia prediction models using logistic regression (unpenalised, uniform shrinkage, lasso or ridge) and random forests. We show it encompasses existing sample size calculation criteria whilst providing model assurance probabilities, instability metrics and degradation statistics about calibration, discrimination, clinical utility, prediction error and fairness. Crucially, the required sample size depends on the users' key estimands and planned model development or updating approach.
Conclusions: The framework generalises existing sample size proposals for model development by utilising anticipated posterior distributions conditional on a chosen sample size and development strategy. This informs the sample size required to target appropriate model performance.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Compatibility of Missing Data Handling Methods across the Stages of Producing Clinical Prediction Models
Authors:
Antonia Tsvetanova,
Matthew Sperrin,
David A. Jenkins,
Niels Peek,
Iain Buchan,
Stephanie Hyland,
Marcus Taylor,
Angela Wood,
Richard D. Riley,
Glen P. Martin
Abstract:
Missing data is a challenge when developing, validating and deploying clinical prediction models (CPMs). Traditionally, decisions concerning missing data handling during CPM development and validation havent accounted for whether missingness is allowed at deployment. We hypothesised that the missing data approach used during model development should optimise model performance upon deployment, whil…
▽ More
Missing data is a challenge when developing, validating and deploying clinical prediction models (CPMs). Traditionally, decisions concerning missing data handling during CPM development and validation havent accounted for whether missingness is allowed at deployment. We hypothesised that the missing data approach used during model development should optimise model performance upon deployment, whilst the approach used during model validation should yield unbiased predictive performance estimates upon deployment; we term this compatibility. We aimed to determine which combinations of missing data handling methods across the CPM life cycle are compatible. We considered scenarios where CPMs are intended to be deployed with missing data allowed or not, and we evaluated the impact of that choice on earlier modelling decisions. Through a simulation study and an empirical analysis of thoracic surgery data, we compared CPMs developed and validated using combinations of complete case analysis, mean imputation, single regression imputation, multiple imputation, and pattern sub-modelling. If planning to deploy a CPM without allowing missing data, then development and validation should use multiple imputation when required. Where missingness is allowed at deployment, the same imputation method must be used during development and validation. Commonly used combinations of missing data handling methods result in biased predictive performance estimates.
△ Less
Submitted 7 May, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
A decomposition of Fisher's information to inform sample size for developing fair and precise clinical prediction models -- Part 2: time-to-event outcomes
Authors:
Richard D Riley,
Gary S Collins,
Lucinda Archer,
Rebecca Whittle,
Amardeep Legha,
Laura Kirton,
Paula Dhiman,
Mohsen Sadatsafavi,
Nicola J Adderley,
Joseph Alderman,
Glen P Martin,
Joie Ensor
Abstract:
Background: When developing a clinical prediction model using time-to-event data, previous research focuses on the sample size to minimise overfitting and precisely estimate the overall risk. However, instability of individual-level risk estimates may still be large. Methods: We propose a decomposition of Fisher's information matrix to examine and calculate the sample size required for developing…
▽ More
Background: When developing a clinical prediction model using time-to-event data, previous research focuses on the sample size to minimise overfitting and precisely estimate the overall risk. However, instability of individual-level risk estimates may still be large. Methods: We propose a decomposition of Fisher's information matrix to examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used before data collection or when an existing dataset is available. Steps (1) to (5) require researchers to specify the overall risk in the target population at a key time-point of interest; an assumed pragmatic 'core model' in the form of an exponential regression model; the (anticipated) joint distribution of core predictors included in that model; and the distribution of any censoring. Results: We derive closed-form solutions that decompose the variance of an individual's estimated event rate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including risk thresholds for decision making, and examine fairness concerns for pre- and post-menopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our approach are close to using more flexible models. Conclusions: Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Scoping review of methodology for aiding generalisability and transportability of clinical prediction models
Authors:
Kritchavat Ploddi,
Matthew Sperrin,
Glen P. Martin,
Maurice M. O'Connell
Abstract:
Generalisability and transportability of clinical prediction models (CPMs) refer to their ability to maintain predictive performance when applied to new populations. While CPMs may show good generalisability or transportability to a specific new population, it is rare for a CPM to be developed using methods that prioritise good generalisability or transportability. There is an emerging literature…
▽ More
Generalisability and transportability of clinical prediction models (CPMs) refer to their ability to maintain predictive performance when applied to new populations. While CPMs may show good generalisability or transportability to a specific new population, it is rare for a CPM to be developed using methods that prioritise good generalisability or transportability. There is an emerging literature of such techniques; therefore, this scoping review aims to summarise the main methodological approaches, assumptions, advantages, disadvantages and future development of methodology aiding the generalisability/transportability. Relevant articles were systematically searched from MEDLINE, Embase, medRxiv, arxiv databases until September 2023 using a predefined set of search terms. Extracted information included methodology description, assumptions, applied examples, advantages and disadvantages. The searches found 1,761 articles; 172 were retained for full text screening; 18 were finally included. We categorised the methodologies according to whether they were data-driven or knowledge-driven, and whether are specifically tailored for target population. Data-driven approaches range from data augmentation to ensemble methods and density ratio weighting, while knowledge-driven strategies rely on causal methodology. Future research could focus on comparison of such methodologies on simulated and real datasets to identify their strengths specific applicability, as well as synthesising these approaches for enhancing their practical usefulness.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Cosmic censorship in a (dual) collider
Authors:
Marc Aragonès Fontboté,
David Mateos,
Guillem Pérez Martín,
Wilke van der Schee,
Javier G. Subils
Abstract:
We investigate cosmic censorship in anti-de Sitter space in holographic models in which the ground state is described by a good singularity. These include supersymmetric truncations of string/M-theory, for which a positive-energy theorem holds. At the boundary, our solutions describe a boost-invariant fluid in which the temperature decreases monotonically with time. On the gravity side, they corre…
▽ More
We investigate cosmic censorship in anti-de Sitter space in holographic models in which the ground state is described by a good singularity. These include supersymmetric truncations of string/M-theory, for which a positive-energy theorem holds. At the boundary, our solutions describe a boost-invariant fluid in which the temperature decreases monotonically with time. On the gravity side, they correspond to black-brane spacetimes with a receding horizon. In classical gravity, curvature invariants at the horizon grow without bound. In the full theory this regime may or may not be reached. In some cases it is avoided by a phase transition to a regular geometry. In others it is reached but the boundary hydrodynamic evolution can be continued, provided the equation of state at parametrically small energies is known. Both cases require the inclusion of finite-$N$ or finite-coupling effects.
△ Less
Submitted 9 January, 2025; v1 submitted 26 November, 2024;
originally announced November 2024.
-
A decomposition of Fisher's information to inform sample size for developing fair and precise clinical prediction models -- part 1: binary outcomes
Authors:
Richard D Riley,
Gary S Collins,
Rebecca Whittle,
Lucinda Archer,
Kym IE Snell,
Paula Dhiman,
Laura Kirton,
Amardeep Legha,
Xiaoxuan Liu,
Alastair Denniston,
Frank E Harrell Jr,
Laure Wynants,
Glen P Martin,
Joie Ensor
Abstract:
When developing a clinical prediction model, the sample size of the development dataset is a key consideration. Small sample sizes lead to greater concerns of overfitting, instability, poor performance and lack of fairness. Previous research has outlined minimum sample size calculations to minimise overfitting and precisely estimate the overall risk. However even when meeting these criteria, the u…
▽ More
When developing a clinical prediction model, the sample size of the development dataset is a key consideration. Small sample sizes lead to greater concerns of overfitting, instability, poor performance and lack of fairness. Previous research has outlined minimum sample size calculations to minimise overfitting and precisely estimate the overall risk. However even when meeting these criteria, the uncertainty (instability) in individual-level risk estimates may be considerable. In this article we propose how to examine and calculate the sample size required for developing a model with acceptably precise individual-level risk estimates to inform decisions and improve fairness. We outline a five-step process to be used before data collection or when an existing dataset is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model, and an assumed 'core model' either specified directly (i.e., a logistic regression equation is provided) or based on specified C-statistic and relative effects of (standardised) predictors. We produce closed-form solutions that decompose the variance of an individual's risk estimate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to quickly calculate and examine individual-level uncertainty interval widths and classification instability for specified sample sizes. Such information can be presented to key stakeholders (e.g., health professionals, patients, funders) using prediction and classification instability plots to help identify the (target) sample size required to improve trust, reliability and fairness in individual predictions. Our proposal is implemented in software module pmstabilityss. We provide real examples and emphasise the importance of clinical context including any risk thresholds for decision making.
△ Less
Submitted 24 January, 2025; v1 submitted 12 July, 2024;
originally announced July 2024.
-
How to develop, externally validate, and update multinomial prediction models
Authors:
Celina K Gehringer,
Glen P Martin,
Ben Van Calster,
Kimme L Hyrich,
Suzanne M M Verstappen,
Jamie C Sergeant
Abstract:
Multinomial prediction models (MPMs) have a range of potential applications across healthcare where the primary outcome of interest has multiple nominal or ordinal categories. However, the application of MPMs is scarce, which may be due to the added methodological complexities that they bring. This article provides a guide of how to develop, externally validate, and update MPMs. Using a previously…
▽ More
Multinomial prediction models (MPMs) have a range of potential applications across healthcare where the primary outcome of interest has multiple nominal or ordinal categories. However, the application of MPMs is scarce, which may be due to the added methodological complexities that they bring. This article provides a guide of how to develop, externally validate, and update MPMs. Using a previously developed and validated MPM for treatment outcomes in rheumatoid arthritis as an example, we outline guidance and recommendations for producing a clinical prediction model using multinomial logistic regression. This article is intended to supplement existing general guidance on prediction model research. This guide is split into three parts: 1) Outcome definition and variable selection, 2) Model development, and 3) Model evaluation (including performance assessment, internal and external validation, and model recalibration). We outline how to evaluate and interpret the predictive performance of MPMs. R code is provided. We recommend the application of MPMs in clinical settings where the prediction of a nominal polytomous outcome is of interest. Future methodological research could focus on MPM-specific considerations for variable selection and sample size criteria for external validation.
△ Less
Submitted 20 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Calibration plots for multistate risk predictions models: an overview and simulation comparing novel approaches
Authors:
Alexander Pate,
Matthew Sperrin,
Richard D. Riley,
Niels Peek,
Tjeerd Van Staa,
Jamie C. Sergeant,
Mamas A. Mamas,
Gregory Y. H. Lip,
Martin O Flaherty,
Michael Barrowman,
Iain Buchan,
Glen P. Martin
Abstract:
Introduction. There is currently no guidance on how to assess the calibration of multistate models used for risk prediction. We introduce several techniques that can be used to produce calibration plots for the transition probabilities of a multistate model, before assessing their performance in the presence of non-informative and informative censoring through a simulation.
Methods. We studied p…
▽ More
Introduction. There is currently no guidance on how to assess the calibration of multistate models used for risk prediction. We introduce several techniques that can be used to produce calibration plots for the transition probabilities of a multistate model, before assessing their performance in the presence of non-informative and informative censoring through a simulation.
Methods. We studied pseudo-values based on the Aalen-Johansen estimator, binary logistic regression with inverse probability of censoring weights (BLR-IPCW), and multinomial logistic regression with inverse probability of censoring weights (MLR-IPCW). The MLR-IPCW approach results in a calibration scatter plot, providing extra insight about the calibration. We simulated data with varying levels of censoring and evaluated the ability of each method to estimate the calibration curve for a set of predicted transition probabilities. We also developed evaluated the calibration of a model predicting the incidence of cardiovascular disease, type 2 diabetes and chronic kidney disease among a cohort of patients derived from linked primary and secondary healthcare records.
Results. The pseudo-value, BLR-IPCW and MLR-IPCW approaches give unbiased estimates of the calibration curves under non-informative censoring. These methods remained unbiased in the presence of informative censoring, unless the mechanism was strongly informative, with bias concentrated in the areas of predicted transition probabilities of low density.
Conclusions. We recommend implementing either the pseudo-value or BLR-IPCW approaches to produce a calibration curve, combined with the MLR-IPCW approach to produce a calibration scatter plot, which provides additional information over either of the other methods.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Minimum Sample Size for Developing a Multivariable Prediction Model using Multinomial Logistic Regression
Authors:
Alexander Pate,
Richard D Riley,
Gary S Collins,
Maarten van Smeden,
Ben Van Calster,
Joie Ensor,
Glen P Martin
Abstract:
Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light…
▽ More
Multinomial logistic regression models allow one to predict the risk of a categorical outcome with more than 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (E.k) and the number of predictor parameters (p.k) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. The first criteria aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct one-to-one logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. We tested the performance of the proposed criteria (i) through a simulation study, and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) are natural extensions from previously proposed criteria for binary outcomes. We illustrate how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Imputation and Missing Indicators for handling missing data in the development and implementation of clinical prediction models: a simulation study
Authors:
Rose Sisk,
Matthew Sperrin,
Niels Peek,
Maarten van Smeden,
Glen P. Martin
Abstract:
Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the gold standard approach, can be challenging to apply in the clinic. Clearly, the outcome cannot be used to impute data at prediction time. Regression imputation…
▽ More
Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the gold standard approach, can be challenging to apply in the clinic. Clearly, the outcome cannot be used to impute data at prediction time. Regression imputation (RI) may offer a pragmatic alternative in the prediction context, that is simpler to apply in the clinic. Moreover, the use of missing indicators can handle informative missingness, but it is currently unknown how well they perform within CPMs. Methods: We performed a simulation study where data were generated under various missing data mechanisms to compare the predictive performance of CPMs developed using both imputation methods. We consider deployment scenarios where missing data is permitted/prohibited, and develop models that use/omit the outcome during imputation and include/omit missing indicators. Results: When complete data must be available at deployment, our findings were in line with widely used recommendations; that the outcome should be used to impute development data under MI, yet omitted under RI. When imputation is applied at deployment, omitting the outcome from the imputation at development was preferred. Missing indicators improved model performance in some specific cases, but can be harmful when missingness is dependent on the outcome. Conclusion: We provide evidence that commonly taught principles of handling missing data via MI may not apply to CPMs, particularly when data can be missing at deployment. In such settings, RI and missing indicator methods can (marginally) outperform MI. As shown, the performance of the missing data handling method must be evaluated on a study-by-study basis, and should be based on whether missing data are allowed at deployment.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A scoping review of causal methods enabling predictions under hypothetical interventions
Authors:
Lijing Lin,
Matthew Sperrin,
David A. Jenkins,
Glen P. Martin,
Niels Peek
Abstract:
Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for developing and validating prediction mo…
▽ More
Background and Aims: The methods with which prediction models are usually developed mean that neither the parameters nor the predictions should be interpreted causally. However, when prediction models are used to support decision making, there is often a need for predicting outcomes under hypothetical interventions. We aimed to identify published methods for developing and validating prediction models that enable risk estimation of outcomes under hypothetical interventions, utilizing causal inference: their main methodological approaches, underlying assumptions, targeted estimands, and potential pitfalls and challenges with using the method, and unresolved methodological challenges.
Methods: We systematically reviewed literature published by December 2019, considering papers in the health domain that used causal considerations to enable prediction models to be used for predictions under hypothetical interventions.
Results: We identified 4919 papers through database searches and a further 115 papers through manual searches, of which 13 were selected for inclusion, from both the statistical and the machine learning literature. Most of the identified methods for causal inference from observational data were based on marginal structural models and g-estimation.
Conclusions: There exist two broad methodological approaches for allowing prediction under hypothetical intervention into clinical prediction models: 1) enriching prediction models derived from observational studies with estimated causal effects from clinical trials and meta-analyses; and 2) estimating prediction models and causal effects directly from observational data. These methods require extending to dynamic treatment regimes, and consideration of multiple interventions to operationalise a clinical decision support system. Techniques for validating 'causal prediction models' are still in their infancy.
△ Less
Submitted 12 January, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Towards a Framework for the Design, Implementation and Reporting of Methodology Scoping Reviews
Authors:
Glen P. Martin,
David Jenkins,
Lucy Bull,
Rose Sisk,
Lijing Lin,
William Hulme,
Anthony Wilson,
Wenjuan Wang,
Michael Barrowman,
Camilla Sammut-Powell,
Alexander Pate,
Matthew Sperrin,
Niels Peek
Abstract:
Background: In view of the growth of published papers, there is an increasing need for studies that summarise scientific research. An increasingly common review is a 'Methodology scoping review', which provides a summary of existing analytical methods, techniques and software, proposed or applied in research articles, which address an analytical problem or further an analytical approach. However,…
▽ More
Background: In view of the growth of published papers, there is an increasing need for studies that summarise scientific research. An increasingly common review is a 'Methodology scoping review', which provides a summary of existing analytical methods, techniques and software, proposed or applied in research articles, which address an analytical problem or further an analytical approach. However, guidelines for their design, implementation and reporting are limited.
Methods: Drawing on the experiences of the authors, which were consolidated through a series of face-to-face workshops, we summarise the challenges inherent in conducting a methodology scoping review and offer suggestions of best practice to promote future guideline development.
Results: We identified three challenges of conducting a methodology scoping review. First, identification of search terms; one cannot usually define the search terms a priori and the language used for a particular method can vary across the literature. Second, the scope of the review requires careful consideration since new methodology is often not described (in full) within abstracts. Third, many new methods are motivated by a specific clinical question, where the methodology may only be documented in supplementary materials. We formulated several recommendations that build upon existing review guidelines. These recommendations ranged from an iterative approach to defining search terms through to screening and data extraction processes.
Conclusion: Although methodology scoping reviews are an important aspect of research, there is currently a lack of guidelines to standardise their design, implementation and reporting. We recommend a wider discussion on this topic.
△ Less
Submitted 16 January, 2020;
originally announced January 2020.
-
Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches
Authors:
Glen P. Martin,
Matthew Sperrin,
Kym I. E. Snell,
Iain Buchan,
Richard D. Riley
Abstract:
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction…
▽ More
Clinical prediction models (CPMs) are used to predict clinically relevant outcomes or events. Typically, prognostic CPMs are derived to predict the risk of a single future outcome. However, with rising emphasis on the prediction of multi-morbidity, there is growing need for CPMs to simultaneously predict risks for each of multiple future outcomes. A common approach to multi-outcome risk prediction is to derive a CPM for each outcome separately, then multiply the predicted risks. This approach is only valid if the outcomes are conditionally independent given the covariates, and it fails to exploit the potential relationships between the outcomes. This paper outlines several approaches that could be used to develop prognostic CPMs for multiple outcomes. We consider four methods, ranging in complexity and assumed conditional independence assumptions: namely, probabilistic classifier chain, multinomial logistic regression, multivariate logistic regression, and a Bayesian probit model. These are compared with methods that rely on conditional independence: separate univariate CPMs and stacked regression. Employing a simulation study and real-world example via the MIMIC-III database, we illustrate that CPMs for joint risk prediction of multiple outcomes should only be derived using methods that model the residual correlation between outcomes. In such a situation, our results suggest that probabilistic classification chains, multinomial logistic regression or the Bayesian probit model are all appropriate choices. We call into question the development of CPMs for each outcome in isolation when multiple correlated or structurally related outcomes are of interest and recommend more holistic risk prediction.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Examining the impact of data quality and completeness of electronic health records on predictions of patients risks of cardiovascular disease
Authors:
Yan Li,
Matthew Sperrin,
Glen P. Martin,
Darren M Ashcroft,
Tjeerd Pieter van Staa
Abstract:
The objective is to assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). The study design is a longitudinal cohort study with a setting of 392 general practices (including 3.6 million patie…
▽ More
The objective is to assess the extent of variation of data quality and completeness of electronic health records and impact on the robustness of risk predictions of incident cardiovascular disease (CVD) using a risk prediction tool that is based on routinely collected data (QRISK3). The study design is a longitudinal cohort study with a setting of 392 general practices (including 3.6 million patients) linked to hospital admission data. Variation in data quality was assessed using Saez stability metrics quantifying outlyingness of each practice. Statistical frailty models evaluated whether accuracy of QRISK3 predictions on individual predictions and effects of overall risk factors (linear predictor) varied between practices. There was substantial heterogeneity between practices in CVD incidence unaccounted for by QRISK3. In the lowest quintile of statistical frailty, a QRISK3 predicted risk of 10% for female was in a range between 7.1% and 9.0% when incorporating practice variability into the statistical frailty models; for the highest quintile, this was 10.9%-16.4%. Data quality (using Saez metrics) and completeness were comparable across different levels of statistical frailty. For example, recording of missing information on ethnicity was 55.7%, 62.7%, 57.8%, 64.8% and 62.1% for practices from lowest to highest quintiles of statistical frailty respectively. The effects of risk factors did not vary between practices with little statistical variation of beta coefficients. In conclusion, the considerable unmeasured heterogeneity in CVD incidence between practices was not explained by variations in data quality or effects of risk factors. QRISK3 risk prediction should be supplemented with clinical judgement and evidence of additional risk factors.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.