-
Optimality of Approximate Message Passing Algorithms for Spiked Matrix Models with Rotationally Invariant Noise
Authors:
Rishabh Dudeja,
Songbin Liu,
Junjie Ma
Abstract:
We study the problem of estimating a rank one signal matrix from an observed matrix generated by corrupting the signal with additive rotationally invariant noise. We develop a new class of approximate message-passing algorithms for this problem and provide a simple and concise characterization of their dynamics in the high-dimensional limit. At each iteration, these algorithms exploit prior knowle…
▽ More
We study the problem of estimating a rank one signal matrix from an observed matrix generated by corrupting the signal with additive rotationally invariant noise. We develop a new class of approximate message-passing algorithms for this problem and provide a simple and concise characterization of their dynamics in the high-dimensional limit. At each iteration, these algorithms exploit prior knowledge about the noise structure by applying a non-linear matrix denoiser to the eigenvalues of the observed matrix and prior information regarding the signal structure by applying a non-linear iterate denoiser to the previous iterates generated by the algorithm. We exploit our result on the dynamics of these algorithms to derive the optimal choices for the matrix and iterate denoisers. We show that the resulting algorithm achieves the smallest possible asymptotic estimation error among a broad class of iterative algorithms under a fixed iteration budget.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Spectral Universality of Regularized Linear Regression with Nearly Deterministic Sensing Matrices
Authors:
Rishabh Dudeja,
Subhabrata Sen,
Yue M. Lu
Abstract:
It has been observed that the performances of many high-dimensional estimation problems are universal with respect to underlying sensing (or design) matrices. Specifically, matrices with markedly different constructions seem to achieve identical performance if they share the same spectral distribution and have ``generic'' singular vectors. We prove this universality phenomenon for the case of conv…
▽ More
It has been observed that the performances of many high-dimensional estimation problems are universal with respect to underlying sensing (or design) matrices. Specifically, matrices with markedly different constructions seem to achieve identical performance if they share the same spectral distribution and have ``generic'' singular vectors. We prove this universality phenomenon for the case of convex regularized least squares (RLS) estimators under a linear regression model with additive Gaussian noise. Our main contributions are two-fold: (1) We introduce a notion of universality classes for sensing matrices, defined through a set of deterministic conditions that fix the spectrum of the sensing matrix and precisely capture the previously heuristic notion of generic singular vectors; (2) We show that for all sensing matrices that lie in the same universality class, the dynamics of the proximal gradient descent algorithm for solving the regression problem, as well as the performance of RLS estimators themselves (under additional strong convexity conditions) are asymptotically identical. In addition to including i.i.d. Gaussian and rotational invariant matrices as special cases, our universality class also contains highly structured, strongly correlated, or even (nearly) deterministic matrices. Examples of the latter include randomly signed versions of incoherent tight frames and randomly subsampled Hadamard transforms. As a consequence of this universality principle, the asymptotic performance of regularized linear regression on many structured matrices constructed with limited randomness can be characterized by using the rotationally invariant ensemble as an equivalent yet mathematically more tractable surrogate.
△ Less
Submitted 20 July, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity
Authors:
Rishabh Dudeja,
Daniel Hsu
Abstract:
Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computational…
▽ More
Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
△ Less
Submitted 20 January, 2024; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Universality of Approximate Message Passing with Semi-Random Matrices
Authors:
Rishabh Dudeja,
Yue M. Lu,
Subhabrata Sen
Abstract:
Approximate Message Passing (AMP) is a class of iterative algorithms that have found applications in many problems in high-dimensional statistics and machine learning. In its general form, AMP can be formulated as an iterative procedure driven by a matrix $\mathbf{M}$. Theoretical analyses of AMP typically assume strong distributional properties on $\mathbf{M}$ such as $\mathbf{M}$ has i.i.d. sub-…
▽ More
Approximate Message Passing (AMP) is a class of iterative algorithms that have found applications in many problems in high-dimensional statistics and machine learning. In its general form, AMP can be formulated as an iterative procedure driven by a matrix $\mathbf{M}$. Theoretical analyses of AMP typically assume strong distributional properties on $\mathbf{M}$ such as $\mathbf{M}$ has i.i.d. sub-Gaussian entries or is drawn from a rotational invariant ensemble. However, numerical experiments suggest that the behavior of AMP is universal, as long as the eigenvectors of $\mathbf{M}$ are generic. In this paper, we take the first step in rigorously understanding this universality phenomenon. In particular, we investigate a class of memory-free AMP algorithms (proposed by Çakmak and Opper for mean-field Ising spin glasses), and show that their asymptotic dynamics is universal on a broad class of semi-random matrices. In addition to having the standard rotational invariant ensemble as a special case, the class of semi-random matrices that we define in this work also includes matrices constructed with very limited randomness. One such example is a randomly signed version of the Sine model, introduced by Marinari, Parisi, Potters, and Ritort for spin glasses with fully deterministic couplings.
△ Less
Submitted 1 May, 2023; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Universality of Linearized Message Passing for Phase Retrieval with Structured Sensing Matrices
Authors:
Rishabh Dudeja,
Milad Bakhshizadeh
Abstract:
In the phase retrieval problem one seeks to recover an unknown $n$ dimensional signal vector $\mathbf{x}$ from $m$ measurements of the form $y_i = |(\mathbf{A} \mathbf{x})_i|$, where $\mathbf{A}$ denotes the sensing matrix. Many algorithms for this problem are based on approximate message passing. For these algorithms, it is known that if the sensing matrix $\mathbf{A}$ is generated by sub-samplin…
▽ More
In the phase retrieval problem one seeks to recover an unknown $n$ dimensional signal vector $\mathbf{x}$ from $m$ measurements of the form $y_i = |(\mathbf{A} \mathbf{x})_i|$, where $\mathbf{A}$ denotes the sensing matrix. Many algorithms for this problem are based on approximate message passing. For these algorithms, it is known that if the sensing matrix $\mathbf{A}$ is generated by sub-sampling $n$ columns of a uniformly random (i.e., Haar distributed) orthogonal matrix, in the high dimensional asymptotic regime ($m,n \rightarrow \infty, n/m \rightarrow κ$), the dynamics of the algorithm are given by a deterministic recursion known as the state evolution. For a special class of linearized message-passing algorithms, we show that the state evolution is universal: it continues to hold even when $\mathbf{A}$ is generated by randomly sub-sampling columns of the Hadamard-Walsh matrix, provided the signal is drawn from a Gaussian prior.
△ Less
Submitted 9 June, 2022; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Statistical Query Lower Bounds for Tensor PCA
Authors:
Rishabh Dudeja,
Daniel Hsu
Abstract:
In the Tensor PCA problem introduced by Richard and Montanari (2014), one is given a dataset consisting of $n$ samples $\mathbf{T}_{1:n}$ of i.i.d. Gaussian tensors of order $k$ with the promise that $\mathbb{E}\mathbf{T}_1$ is a rank-1 tensor and $\|\mathbb{E} \mathbf{T}_1\| = 1$. The goal is to estimate $\mathbb{E} \mathbf{T}_1$. This problem exhibits a large conjectured hard phase when $k>2$: W…
▽ More
In the Tensor PCA problem introduced by Richard and Montanari (2014), one is given a dataset consisting of $n$ samples $\mathbf{T}_{1:n}$ of i.i.d. Gaussian tensors of order $k$ with the promise that $\mathbb{E}\mathbf{T}_1$ is a rank-1 tensor and $\|\mathbb{E} \mathbf{T}_1\| = 1$. The goal is to estimate $\mathbb{E} \mathbf{T}_1$. This problem exhibits a large conjectured hard phase when $k>2$: When $d \lesssim n \ll d^{\frac{k}{2}}$ it is information theoretically possible to estimate $\mathbb{E} \mathbf{T}_1$, but no polynomial time estimator is known. We provide a sharp analysis of the optimal sample complexity in the Statistical Query (SQ) model and show that SQ algorithms with polynomial query complexity not only fail to solve Tensor PCA in the conjectured hard phase, but also have a strictly sub-optimal sample complexity compared to some polynomial time estimators such as the Richard-Montanari spectral estimator. Our analysis reveals that the optimal sample complexity in the SQ model depends on whether $\mathbb{E} \mathbf{T}_1$ is symmetric or not. For symmetric, even order tensors, we also isolate a sample size regime in which it is possible to test if $\mathbb{E} \mathbf{T}_1 = \mathbf{0}$ or $\mathbb{E}\mathbf{T}_1 \neq \mathbf{0}$ with polynomially many queries but not estimate $\mathbb{E}\mathbf{T}_1$. Our proofs rely on the Fourier analytic approach of Feldman, Perkins and Vempala (2018) to prove sharp SQ lower bounds.
△ Less
Submitted 13 February, 2021; v1 submitted 10 August, 2020;
originally announced August 2020.
-
Information Theoretic Limits for Phase Retrieval with Subsampled Haar Sensing Matrices
Authors:
Rishabh Dudeja,
Junjie Ma,
Arian Maleki
Abstract:
We study information theoretic limits of recovering an unknown $n$ dimensional, complex signal vector $\mathbf{x}_\star$ with unit norm from $m$ magnitude-only measurements of the form $y_i = |(\mathbf{A} \mathbf{x}_\star)_i|^2, \; i = 1,2 \dots , m$, where $\mathbf{A}$ is the sensing matrix. This is known as the Phase Retrieval problem and models practical imaging systems where measuring the phas…
▽ More
We study information theoretic limits of recovering an unknown $n$ dimensional, complex signal vector $\mathbf{x}_\star$ with unit norm from $m$ magnitude-only measurements of the form $y_i = |(\mathbf{A} \mathbf{x}_\star)_i|^2, \; i = 1,2 \dots , m$, where $\mathbf{A}$ is the sensing matrix. This is known as the Phase Retrieval problem and models practical imaging systems where measuring the phase of the observations is difficult. Since in a number of applications, the sensing matrix has orthogonal columns, we model the sensing matrix as a subsampled Haar matrix formed by picking $n$ columns of a uniformly random $m \times m$ unitary matrix. We study this problem in the high dimensional asymptotic regime, where $m,n \rightarrow \infty$, while $m/n \rightarrow δ$ with $δ$ being a fixed number, and show that if $m < (2-o_n(1))\cdot n$, then any estimator is asymptotically orthogonal to the true signal vector $\mathbf{x}_\star$. This lower bound is sharp since when $m > (2+o_n(1)) \cdot n $, estimators that achieve a non trivial asymptotic correlation with the signal vector are known from previous works.
△ Less
Submitted 4 August, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices
Authors:
Rishabh Dudeja,
Milad Bakhshizadeh,
Junjie Ma,
Arian Maleki
Abstract:
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the…
▽ More
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements $m$, and the dimension of the signal, $n$, diverge to infinity with $m/n = δ\in(1,\infty)$, and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.
△ Less
Submitted 4 March, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.