-
Harmless interpolation in regression and classification with structured features
Authors:
Andrew D. McRae,
Santhosh Karnik,
Mark A. Davenport,
Vidya Muthukumar
Abstract:
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the inpu…
▽ More
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features.
△ Less
Submitted 21 February, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Optimal convex lifted sparse phase retrieval and PCA with an atomic matrix norm regularizer
Authors:
Andrew D. McRae,
Justin Romberg,
Mark A. Davenport
Abstract:
We present novel analysis and algorithms for solving sparse phase retrieval and sparse principal component analysis (PCA) with convex lifted matrix formulations. The key innovation is a new mixed atomic matrix norm that, when used as regularization, promotes low-rank matrices with sparse factors. We show that convex programs with this atomic norm as a regularizer provide near-optimal sample comple…
▽ More
We present novel analysis and algorithms for solving sparse phase retrieval and sparse principal component analysis (PCA) with convex lifted matrix formulations. The key innovation is a new mixed atomic matrix norm that, when used as regularization, promotes low-rank matrices with sparse factors. We show that convex programs with this atomic norm as a regularizer provide near-optimal sample complexity and error rate guarantees for sparse phase retrieval and sparse PCA. While we do not know how to solve the convex programs exactly with an efficient algorithm, for the phase retrieval case we carefully analyze the program and its dual and thereby derive a practical heuristic algorithm. We show empirically that this practical algorithm performs similarly to existing state-of-the-art algorithms.
△ Less
Submitted 26 September, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Thomson's Multitaper Method Revisited
Authors:
Santhosh Karnik,
Justin Romberg,
Mark A. Davenport
Abstract:
Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furtherm…
▽ More
Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furthermore, many practitioners use a DPSS bandwidth $W$ and number of tapers that are smaller than what the theory suggests is optimal because the computational requirements increase with the number of tapers. We revisit Thomson's multitaper method from a linear algebra perspective involving subspace projections. This provides additional insight and helps us establish nonasymptotic bounds on some statistical properties of the multitaper spectral estimate, which are similar to existing asymptotic results. We show using $K=2NW-O(\log(NW))$ tapers instead of the traditional $2NW-O(1)$ tapers better protects against spectral leakage, especially when the power spectrum has a high dynamic range. Our perspective also allows us to derive an $ε$-approximation to the multitaper spectral estimate which can be evaluated on a grid of frequencies using $O(\log(NW)\log\tfrac{1}ε)$ FFTs instead of $K=O(NW)$ FFTs. This is useful in problems where many samples are taken, and thus, using many tapers is desirable.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Improved bounds for the eigenvalues of prolate spheroidal wave functions and discrete prolate spheroidal sequences
Authors:
Santhosh Karnik,
Justin Romberg,
Mark A. Davenport
Abstract:
The discrete prolate spheroidal sequences (DPSSs) are a set of orthonormal sequences in $\ell_2(\mathbb{Z})$ which are strictly bandlimited to a frequency band $[-W,W]$ and maximally concentrated in a time interval $\{0,\ldots,N-1\}$. The timelimited DPSSs (sometimes referred to as the Slepian basis) are an orthonormal set of vectors in $\mathbb{C}^N$ whose discrete time Fourier transform (DTFT) i…
▽ More
The discrete prolate spheroidal sequences (DPSSs) are a set of orthonormal sequences in $\ell_2(\mathbb{Z})$ which are strictly bandlimited to a frequency band $[-W,W]$ and maximally concentrated in a time interval $\{0,\ldots,N-1\}$. The timelimited DPSSs (sometimes referred to as the Slepian basis) are an orthonormal set of vectors in $\mathbb{C}^N$ whose discrete time Fourier transform (DTFT) is maximally concentrated in a frequency band $[-W,W]$. Due to these properties, DPSSs have a wide variety of signal processing applications. The DPSSs are the eigensequences of a timelimit-then-bandlimit operator and the Slepian basis vectors are the eigenvectors of the so-called prolate matrix. The eigenvalues in both cases are the same, and they exhibit a particular clustering behavior -- slightly fewer than $2NW$ eigenvalues are very close to $1$, slightly fewer than $N-2NW$ eigenvalues are very close to $0$, and very few eigenvalues are not near $1$ or $0$. This eigenvalue behavior is critical in many of the applications in which DPSSs are used. There are many asymptotic characterizations of the number of eigenvalues not near $0$ or $1$. In contrast, there are very few non-asymptotic results, and these don't fully characterize the clustering behavior of the DPSS eigenvalues. In this work, we establish two novel non-asymptotic bounds on the number of DPSS eigenvalues between $ε$ and $1-ε$. Also, we obtain bounds detailing how close the first $\approx 2NW$ eigenvalues are to $1$ and how close the last $\approx N-2NW$ eigenvalues are to $0$. Furthermore, we extend these results to the eigenvalues of the prolate spheroidal wave functions (PSWFs), which are the continuous-time version of the DPSSs. Finally, we present numerical experiments demonstrating the quality of our non-asymptotic bounds on the number of DPSS eigenvalues between $ε$ and $1-ε$.
△ Less
Submitted 28 September, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Low-rank matrix completion and denoising under Poisson noise
Authors:
Andrew D. McRae,
Mark A. Davenport
Abstract:
This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically…
▽ More
This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically, we analyze several estimators, including a constrained nuclear-norm minimization program, nuclear-norm regularized least squares, and a nonconvex constrained low-rank optimization problem. We show that for all three estimators, with high probability, we have an upper error bound (in the Frobenius norm error metric) that depends on the matrix rank, the fraction of the elements observed, and maximal row and column sums of the true matrix. We furthermore show that the above results are minimax optimal (within a universal constant) in classes of matrices with low rank and bounded row and column sums. We also extend these results to handle the case of matrix multinomial denoising and completion.
△ Less
Submitted 30 April, 2020; v1 submitted 11 July, 2019;
originally announced July 2019.
-
The Fast Slepian Transform
Authors:
Santhosh Karnik,
Zhihui Zhu,
Michael B. Wakin,
Justin Romberg,
Mark A. Davenport
Abstract:
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructi…
▽ More
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time-frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.
△ Less
Submitted 10 August, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.
-
1-Bit Matrix Completion
Authors:
Mark A. Davenport,
Yaniv Plan,
Ewout van den Berg,
Mary Wootters
Abstract:
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an…
▽ More
In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when ||M||_{\infty} <= α, and rank(M) <= r. If the log-likelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1-bit measurements, then we can eliminate the requirement that ||M||_{\infty} <= α. For both cases, we provide lower bounds showing that these estimates are near-optimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1-bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better.
△ Less
Submitted 1 July, 2014; v1 submitted 17 September, 2012;
originally announced September 2012.
-
On the stability and accuracy of least squares approximations
Authors:
Albert Cohen,
Mark A. Davenport,
Dany Leviatan
Abstract:
We consider the problem of reconstructing an unknown function $f$ on a domain $X$ from samples of $f$ at $n$ randomly chosen points with respect to a given measure $ρ_X$. Given a sequence of linear spaces $(V_m)_{m>0}$ with ${\rm dim}(V_m)=m\leq n$, we study the least squares approximations from the spaces $V_m$. It is well known that such approximations can be inaccurate when $m$ is too close to…
▽ More
We consider the problem of reconstructing an unknown function $f$ on a domain $X$ from samples of $f$ at $n$ randomly chosen points with respect to a given measure $ρ_X$. Given a sequence of linear spaces $(V_m)_{m>0}$ with ${\rm dim}(V_m)=m\leq n$, we study the least squares approximations from the spaces $V_m$. It is well known that such approximations can be inaccurate when $m$ is too close to $n$, even when the samples are noiseless. Our main result provides a criterion on $m$ that describes the needed amount of regularization to ensure that the least squares method is stable and that its accuracy, measured in $L^2(X,ρ_X)$, is comparable to the best approximation error of $f$ by elements from $V_m$. We illustrate this criterion for various approximation schemes, such as trigonometric polynomials, with $ρ_X$ being the uniform measure, and algebraic polynomials, with $ρ_X$ being either the uniform or Chebyshev measure. For such examples we also prove similar stability results using deterministic samples that are equispaced with respect to these measures.
△ Less
Submitted 15 June, 2018; v1 submitted 18 November, 2011;
originally announced November 2011.
-
How well can we estimate a sparse vector?
Authors:
Emmanuel J. Candès,
Mark A. Davenport
Abstract:
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random proj…
▽ More
The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved.
△ Less
Submitted 1 March, 2013; v1 submitted 27 April, 2011;
originally announced April 2011.
-
A simple proof that random matrices are democratic
Authors:
Mark A. Davenport,
Jason N. Laska,
Petros T. Boufounos,
Richard G. Baraniuk
Abstract:
The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement…
▽ More
The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement schemes provide a near-optimal encoding in terms of the required number of measurements. In this report, we explore another relatively unexplored, though often alluded to, advantage of using random matrices to acquire CS measurements. Specifically, we show that random matrices are democractic, meaning that each measurement carries roughly the same amount of signal information. We demonstrate that by slightly increasing the number of measurements, the system is robust to the loss of a small number of arbitrary measurements. In addition, we draw connections to oversampling and demonstrate stability from the loss of significantly more measurements.
△ Less
Submitted 4 November, 2009;
originally announced November 2009.
-
Analysis of Orthogonal Matching Pursuit using the Restricted Isometry Property
Authors:
Mark A. Davenport,
Michael B. Wakin
Abstract:
Orthogonal Matching Pursuit (OMP) is the canonical greedy algorithm for sparse approximation. In this paper we demonstrate that the restricted isometry property (RIP) can be used for a very straightforward analysis of OMP. Our main conclusion is that the RIP of order $K+1$ (with isometry constant $δ< \frac{1}{3\sqrt{K}}$) is sufficient for OMP to exactly recover any $K$-sparse signal. Our analys…
▽ More
Orthogonal Matching Pursuit (OMP) is the canonical greedy algorithm for sparse approximation. In this paper we demonstrate that the restricted isometry property (RIP) can be used for a very straightforward analysis of OMP. Our main conclusion is that the RIP of order $K+1$ (with isometry constant $δ< \frac{1}{3\sqrt{K}}$) is sufficient for OMP to exactly recover any $K$-sparse signal. Our analysis relies on simple and intuitive observations about OMP and matrices which satisfy the RIP. For restricted classes of $K$-sparse signals (those that are highly compressible), a relaxed bound on the isometry constant is also established. A deeper understanding of OMP may benefit the analysis of greedy algorithms in general. To demonstrate this, we also briefly revisit the analysis of the Regularized OMP (ROMP) algorithm.
△ Less
Submitted 1 September, 2009;
originally announced September 2009.