-
Shrinkage Estimation of Functions of Large Noisy Symmetric Matrices
Authors:
Panagiotis Lolas,
Lexing Ying
Abstract:
We study the problem of estimating functions of a large symmetric matrix $A_n$ when we only have
access to a noisy estimate $\hat{A}_n=A_n+σZ_n/\sqrt{n}.$ We are interested
in the case that $Z_n$ is a Wigner ensemble and suggest an algorithm based on nonlinear shrinkage of
the eigenvalues of $\hat{A}_n.$ As an intermediate step we explain how recovery of the spectrum of
$A_n$ is possible u…
▽ More
We study the problem of estimating functions of a large symmetric matrix $A_n$ when we only have
access to a noisy estimate $\hat{A}_n=A_n+σZ_n/\sqrt{n}.$ We are interested
in the case that $Z_n$ is a Wigner ensemble and suggest an algorithm based on nonlinear shrinkage of
the eigenvalues of $\hat{A}_n.$ As an intermediate step we explain how recovery of the spectrum of
$A_n$ is possible using only the spectrum of $\hat{A}_n$. Our algorithm has important applications,
for example, in solving high-dimensional noisy systems of equations or symmetric matrix
denoising. Throughout our analysis we rely on tools from random matrix theory.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
$σ$-Ridge: group regularized ridge regression via empirical Bayes noise level cross-validation
Authors:
Nikolaos Ignatiadis,
Panagiotis Lolas
Abstract:
Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into $K$ groups based on external side-information. For example, in high-throughput biology, features may represent gene expression, protein abundance or clinical data and so each feature group represents a distinct modality…
▽ More
Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into $K$ groups based on external side-information. For example, in high-throughput biology, features may represent gene expression, protein abundance or clinical data and so each feature group represents a distinct modality. The analyst's goal is to choose optimal regularization parameters $λ= (λ_1, \dotsc, λ_K)$ -- one for each group. In this work, we study the impact of $λ$ on the predictive risk of group-regularized ridge regression by deriving limiting risk formulae under a high-dimensional random effects model with $p\asymp n$ as $n \to \infty$. Furthermore, we propose a data-driven method for choosing $λ$ that attains the optimal asymptotic risk: The key idea is to interpret the residual noise variance $σ^2$, as a regularization parameter to be chosen through cross-validation. An empirical Bayes construction maps the one-dimensional parameter $σ$ to the $K$-dimensional vector of regularization parameters, i.e., $σ\mapsto \widehatλ(σ)$. Beyond its theoretical optimality, the proposed method is practical and runs as fast as cross-validated ridge regression without feature groups ($K=1$).
△ Less
Submitted 4 March, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Regularization in High-Dimensional Regression and Classification via Random Matrix Theory
Authors:
Panagiotis Lolas
Abstract:
We study general singular value shrinkage estimators in high-dimensional regression and classification, when the number of features and the sample size both grow proportionally to infinity. We allow models with general covariance matrices that include a large class of data generating distributions. As far as the implications of our results are concerned, we find exact asymptotic formulas for both…
▽ More
We study general singular value shrinkage estimators in high-dimensional regression and classification, when the number of features and the sample size both grow proportionally to infinity. We allow models with general covariance matrices that include a large class of data generating distributions. As far as the implications of our results are concerned, we find exact asymptotic formulas for both the training and test errors in regression models fitted by gradient descent, which provides theoretical insights for early stopping as a regularization method. In addition, we propose a numerical method based on the empirical spectra of covariance matrices for the optimal eigenvalue shrinkage classifier in linear discriminant analysis. Finally, we derive optimal estimators for the dense mean vectors of high-dimensional distributions. Throughout our analysis we rely on recent advances in random matrix theory and develop further results of independent mathematical interest.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Interacting particle systems at the edge of multilevel Jack processes
Authors:
Evgeni Dimitrov,
Panagiotis Lolas
Abstract:
We consider a multilevel continuous time Markov chain $X(s;N) = (X_i^j(s;N): 1 \leq i \leq j \leq N)$, which is defined by means of Jack symmetric functions and forms a certain discretization of the multilevel Dyson Brownian motion. The process $X(s;N)$ describes the evolution of a discrete interlacing particle system with push-block interactions between the particles, which preserve the interlaci…
▽ More
We consider a multilevel continuous time Markov chain $X(s;N) = (X_i^j(s;N): 1 \leq i \leq j \leq N)$, which is defined by means of Jack symmetric functions and forms a certain discretization of the multilevel Dyson Brownian motion. The process $X(s;N)$ describes the evolution of a discrete interlacing particle system with push-block interactions between the particles, which preserve the interlacing property. We study the joint asymptotic separation of the particles at the right edge of the ensemble as the number of levels and time tend to infinity and show that the limit is described by a certain zero range process with local interactions.
△ Less
Submitted 11 December, 2016;
originally announced December 2016.