-
Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering
Authors:
Hugo Lebeau,
Florent Chatelain,
Romain Couillet
Abstract:
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian…
▽ More
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
△ Less
Submitted 27 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation
Authors:
Hugo Lebeau,
Florent Chatelain,
Romain Couillet
Abstract:
This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold. Relying on standard tools from the theory of large random matrices, we characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of…
▽ More
This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold. Relying on standard tools from the theory of large random matrices, we characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of the principal directions of the signal. These results allow to accurately predict the reconstruction performance of truncated multilinear SVD (MLSVD) in the non-trivial regime. This is particularly important since it serves as an initialization of the higher-order orthogonal iteration (HOOI) scheme, whose convergence to the best low-multilinear-rank approximation depends entirely on its initialization. We give a sufficient condition for the convergence of HOOI and show that the number of iterations before convergence tends to $1$ in the large-dimensional limit.
△ Less
Submitted 14 January, 2025; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering
Authors:
Romain Couillet,
Florent Chatelain,
Nicolas Le Bihan
Abstract:
The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix $X\in\mathbb{C}^{p\times n}$ (or $\mathbb{R}^{p\times n}$) and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks: $S\in\{0,1\}^{p\times n}$ for $X$ and $B\in\{0,1\}^{n\times n}$ for $K$. T…
▽ More
The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix $X\in\mathbb{C}^{p\times n}$ (or $\mathbb{R}^{p\times n}$) and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks: $S\in\{0,1\}^{p\times n}$ for $X$ and $B\in\{0,1\}^{n\times n}$ for $K$. The resulting "two-way punctured" kernel is thus given by $K=\frac{1}{p}[(X \odot S)^{\sf H} (X \odot S)] \odot B$. We demonstrate that, for $X$ composed of independent columns drawn from a Gaussian mixture model, as $n,p\to\infty$ with $p/n\to c_0\in(0,\infty)$, the spectral behavior of $K$ -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.
△ Less
Submitted 17 May, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Isotropic Multiple Scattering Processes on Hyperspheres
Authors:
Nicolas Le Bihan,
Florent Chatelain,
Jonathan H. Manton
Abstract:
This paper presents several results about isotropic random walks and multiple scattering processes on hyperspheres ${\mathbb S}^{p-1}$. It allows one to derive the Fourier expansions on ${\mathbb S}^{p-1}$ of these processes. A result of unimodality for the multiconvolution of symmetrical probability density functions (pdf) on ${\mathbb S}^{p-1}$ is also introduced. Such processes are then studied…
▽ More
This paper presents several results about isotropic random walks and multiple scattering processes on hyperspheres ${\mathbb S}^{p-1}$. It allows one to derive the Fourier expansions on ${\mathbb S}^{p-1}$ of these processes. A result of unimodality for the multiconvolution of symmetrical probability density functions (pdf) on ${\mathbb S}^{p-1}$ is also introduced. Such processes are then studied in the case where the scattering distribution is von Mises Fisher (vMF). Asymptotic distributions for the multiconvolution of vMFs on ${\mathbb S}^{p-1}$ are obtained. Both Fourier expansion and asymptotic approximation allows us to compute estimation bounds for the parameters of Compound Cox Processes (CCP) on ${\mathbb S}^{p-1}$.
△ Less
Submitted 13 December, 2015; v1 submitted 12 August, 2014;
originally announced August 2014.