Search | arXiv e-print repository

Blocked Cross-Validation: A Precise and Efficient Method for Hyperparameter Tuning

Abstract: Hyperparameter tuning plays a crucial role in optimizing the performance of predictive learners. Cross--validation (CV) is a widely adopted technique for estimating the error of different hyperparameter settings. Repeated cross-validation (RCV) has been commonly employed to reduce the variability of CV errors. In this paper, we introduce a novel approach called blocked cross-validation (BCV), wher… ▽ More Hyperparameter tuning plays a crucial role in optimizing the performance of predictive learners. Cross--validation (CV) is a widely adopted technique for estimating the error of different hyperparameter settings. Repeated cross-validation (RCV) has been commonly employed to reduce the variability of CV errors. In this paper, we introduce a novel approach called blocked cross-validation (BCV), where the repetitions are blocked with respect to both CV partition and the random behavior of the learner. Theoretical analysis and empirical experiments demonstrate that BCV provides more precise error estimates compared to RCV, even with a significantly reduced number of runs. We present extensive examples using real--world data sets to showcase the effectiveness and efficiency of BCV in hyperparameter tuning. Our results indicate that BCV outperforms RCV in hyperparameter tuning, achieving greater precision with fewer computations. △ Less

Submitted 31 July, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

Comments: 28 pages, 7 figures

MSC Class: 62-00 ACM Class: G.3

arXiv:2105.13581 [pdf, other]

Sparse Principal Components Analysis: a Tutorial

Authors: Giovanni Maria Merola

Abstract: The topic of this tutorial is Least Squares Sparse Principal Components Analysis (LS SPCA) which is a simple method for computing approximated Principal Components which are combinations of only a few of the observed variables. Analogously to Principal Components, these components are uncorrelated and sequentially best approximate the dataset. The derivation of LS SPCA is intuitive for anyone fami… ▽ More The topic of this tutorial is Least Squares Sparse Principal Components Analysis (LS SPCA) which is a simple method for computing approximated Principal Components which are combinations of only a few of the observed variables. Analogously to Principal Components, these components are uncorrelated and sequentially best approximate the dataset. The derivation of LS SPCA is intuitive for anyone familiar with linear regression. Since LS SPCA is based on a different optimality from other SPCA methods and does not suffer from their serious drawbacks. I will demonstrate on two datasets how useful and parsimonious sparse PCs can be computed. An R package for computing LS SPCA is available for download. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: 26 pages, preprint subimitted for publication

arXiv:1910.03266 [pdf, other]

doi 10.1080/02664763.2019.1676404

SIMPCA: A framework for rotating and sparsifying principal components

Authors: Giovanni Maria Merola

Abstract: We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified com… ▽ More We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: Accepted for publication by Journal of Applied Statistics on Oct 2, 2019

arXiv:1612.00939 [pdf, other]

Projection Sparse Principal Component Analysis: an efficient least squares method

Authors: Giovanni Maria Merola

Abstract: We propose a new sparse principal component analysis (SPCA) method in which the solutions are obtained by projecting the full cardinality principal components onto subsets of variables. The resulting components are guaranteed to explain a given proportion of variance. The computation of these solutions is very efficient. The proposed method compares well with the optimal least squares sparse compo… ▽ More We propose a new sparse principal component analysis (SPCA) method in which the solutions are obtained by projecting the full cardinality principal components onto subsets of variables. The resulting components are guaranteed to explain a given proportion of variance. The computation of these solutions is very efficient. The proposed method compares well with the optimal least squares sparse components. We show that other SPCA methods fail to identify the best sparse approximations of the principal components and explain less variance than our solutions. We illustrate and compare our method with the analysis of a real dataset containing socioeconomic data and the computational results for nine datasets of increasing dimension with up to 16,000 variables. △ Less

Submitted 7 October, 2019; v1 submitted 3 December, 2016; originally announced December 2016.

Comments: 31 pages, submitted for publication

arXiv:1406.1381 [pdf, ps, other]

Sparse Principal Component Analysis: a Least Squares approximation approach

Authors: Giovanni Maria Merola

Abstract: Sparse Principal Components Analysis aims to find principal components with few non-zero loadings. We derive such sparse solutions by adding a genuine sparsity requirement to the original Principal Components Analysis (PCA) objective function. This approach differs from others because it preserves PCA's original optimality: \uns\ of the components and Least Squares approximation of the data. To id… ▽ More Sparse Principal Components Analysis aims to find principal components with few non-zero loadings. We derive such sparse solutions by adding a genuine sparsity requirement to the original Principal Components Analysis (PCA) objective function. This approach differs from others because it preserves PCA's original optimality: \uns\ of the components and Least Squares approximation of the data. To identify the best subset of non-zero loadings we propose a Branch-and-Bound search and an iterative elimination algorithm. This last algorithm finds sparse solutions with large loadings and can be run without specifying the cardinality of the loadings and the number of components to compute in advance. We give thorough comparisons with the existing Sparse PCA methods and several examples on real datasets. △ Less

Submitted 17 August, 2014; v1 submitted 5 June, 2014; originally announced June 2014.

Comments: 25 pages, with appendix. Submitted to Australian & New Zealand Journal of Statistics

MSC Class: 62H12 62H25

Showing 1–5 of 5 results for author: Merola, G M