-
Blocked Cross-Validation: A Precise and Efficient Method for Hyperparameter Tuning
Authors:
Giovanni Maria Merola
Abstract:
Hyperparameter tuning plays a crucial role in optimizing the performance of predictive learners. Cross--validation (CV) is a widely adopted technique for estimating the error of different hyperparameter settings. Repeated cross-validation (RCV) has been commonly employed to reduce the variability of CV errors. In this paper, we introduce a novel approach called blocked cross-validation (BCV), wher…
▽ More
Hyperparameter tuning plays a crucial role in optimizing the performance of predictive learners. Cross--validation (CV) is a widely adopted technique for estimating the error of different hyperparameter settings. Repeated cross-validation (RCV) has been commonly employed to reduce the variability of CV errors. In this paper, we introduce a novel approach called blocked cross-validation (BCV), where the repetitions are blocked with respect to both CV partition and the random behavior of the learner. Theoretical analysis and empirical experiments demonstrate that BCV provides more precise error estimates compared to RCV, even with a significantly reduced number of runs. We present extensive examples using real--world data sets to showcase the effectiveness and efficiency of BCV in hyperparameter tuning. Our results indicate that BCV outperforms RCV in hyperparameter tuning, achieving greater precision with fewer computations.
△ Less
Submitted 31 July, 2023; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Sparse Principal Components Analysis: a Tutorial
Authors:
Giovanni Maria Merola
Abstract:
The topic of this tutorial is Least Squares Sparse Principal Components Analysis (LS SPCA) which is a simple method for computing approximated Principal Components which are combinations of only a few of the observed variables. Analogously to Principal Components, these components are uncorrelated and sequentially best approximate the dataset. The derivation of LS SPCA is intuitive for anyone fami…
▽ More
The topic of this tutorial is Least Squares Sparse Principal Components Analysis (LS SPCA) which is a simple method for computing approximated Principal Components which are combinations of only a few of the observed variables. Analogously to Principal Components, these components are uncorrelated and sequentially best approximate the dataset. The derivation of LS SPCA is intuitive for anyone familiar with linear regression. Since LS SPCA is based on a different optimality from other SPCA methods and does not suffer from their serious drawbacks. I will demonstrate on two datasets how useful and parsimonious sparse PCs can be computed. An R package for computing LS SPCA is available for download.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
SIMPCA: A framework for rotating and sparsifying principal components
Authors:
Giovanni Maria Merola
Abstract:
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified com…
▽ More
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Projection Sparse Principal Component Analysis: an efficient least squares method
Authors:
Giovanni Maria Merola
Abstract:
We propose a new sparse principal component analysis (SPCA) method in which the solutions are obtained by projecting the full cardinality principal components onto subsets of variables. The resulting components are guaranteed to explain a given proportion of variance. The computation of these solutions is very efficient. The proposed method compares well with the optimal least squares sparse compo…
▽ More
We propose a new sparse principal component analysis (SPCA) method in which the solutions are obtained by projecting the full cardinality principal components onto subsets of variables. The resulting components are guaranteed to explain a given proportion of variance. The computation of these solutions is very efficient. The proposed method compares well with the optimal least squares sparse components. We show that other SPCA methods fail to identify the best sparse approximations of the principal components and explain less variance than our solutions. We illustrate and compare our method with the analysis of a real dataset containing socioeconomic data and the computational results for nine datasets of increasing dimension with up to 16,000 variables.
△ Less
Submitted 7 October, 2019; v1 submitted 3 December, 2016;
originally announced December 2016.
-
Sparse Principal Component Analysis: a Least Squares approximation approach
Authors:
Giovanni Maria Merola
Abstract:
Sparse Principal Components Analysis aims to find principal components with few non-zero loadings. We derive such sparse solutions by adding a genuine sparsity requirement to the original Principal Components Analysis (PCA) objective function. This approach differs from others because it preserves PCA's original optimality: \uns\ of the components and Least Squares approximation of the data. To id…
▽ More
Sparse Principal Components Analysis aims to find principal components with few non-zero loadings. We derive such sparse solutions by adding a genuine sparsity requirement to the original Principal Components Analysis (PCA) objective function. This approach differs from others because it preserves PCA's original optimality: \uns\ of the components and Least Squares approximation of the data. To identify the best subset of non-zero loadings we propose a Branch-and-Bound search and an iterative elimination algorithm. This last algorithm finds sparse solutions with large loadings and can be run without specifying the cardinality of the loadings and the number of components to compute in advance. We give thorough comparisons with the existing Sparse PCA methods and several examples on real datasets.
△ Less
Submitted 17 August, 2014; v1 submitted 5 June, 2014;
originally announced June 2014.