Think before you shrink: Alternatives to default shrinkage methods can improve prediction accuracy, calibration and coverage
Authors:
Mark A. van de Wiel,
Gwenaël G. R. Leday,
Jeroen Hoogland,
Martijn W. Heymans,
Erik W. van Zwet,
Ailko H. Zwinderman
Abstract:
While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Much of the criticism stems from the usage…
▽ More
While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Much of the criticism stems from the usage of standard shrinkage methods, such as lasso and ridge with a single, cross-validated penalty. Our aim is to show that readily available alternatives can strongly improve predictive performance, in terms of accuracy, calibration or coverage. For linear regression, we use small sample splits of a large, fairly typical epidemiological data set to illustrate this. We show that usage of differential ridge penalties for covariate groups may enhance prediction accuracy, while calibration and coverage benefit from additional shrinkage of the penalties. In the logistic setting, we apply an external simulation to demonstrate that local shrinkage improves calibration with respect to global shrinkage, while providing better prediction accuracy than other solutions, like Firth's correction. The benefits of the alternative shrinkage methods are easily accessible via example implementations using \texttt{mgcv} and \texttt{r-stan}, including the estimation of multiple penalties. A synthetic copy of the large data set is shared for reproducibility.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
Stable prediction with radiomics data
Authors:
Carel F. W. Peeters,
Caroline Übelhör,
Steven W. Mes,
Roland Martens,
Thomas Koopman,
Pim de Graaf,
Floris H. P. van Velden,
Ronald Boellaard,
Jonas A. Castelijns,
Dennis E. te Beest,
Martijn W. Heymans,
Mark A. van de Wiel
Abstract:
Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is d…
▽ More
Motivation: Radiomics refers to the high-throughput mining of quantitative features from radiographic images. It is a promising field in that it may provide a non-invasive solution for screening and classification. Standard machine learning classification and feature selection techniques, however, tend to display inferior performance in terms of (the stability of) predictive performance. This is due to the heavy multicollinearity present in radiomic data. We set out to provide an easy-to-use approach that deals with this problem.
Results: We developed a four-step approach that projects the original high-dimensional feature space onto a lower-dimensional latent-feature space, while retaining most of the covariation in the data. It consists of (i) penalized maximum likelihood estimation of a redundancy filtered correlation matrix. The resulting matrix (ii) is the input for a maximum likelihood factor analysis procedure. This two-stage maximum-likelihood approach can be used to (iii) produce a compact set of stable features that (iv) can be directly used in any (regression-based) classifier or predictor. It outperforms other classification (and feature selection) techniques in both external and internal validation settings regarding survival in squamous cell cancers.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.