Search | arXiv e-print repository

Harmless interpolation in regression and classification with structured features

Authors: Andrew D. McRae, Santhosh Karnik, Mark A. Davenport, Vidya Muthukumar

Abstract: Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the inpu… ▽ More Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features. △ Less

Submitted 21 February, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

arXiv:2111.04652 [pdf, other]

doi 10.1109/TIT.2022.3228508

Optimal convex lifted sparse phase retrieval and PCA with an atomic matrix norm regularizer

Authors: Andrew D. McRae, Justin Romberg, Mark A. Davenport

Abstract: We present novel analysis and algorithms for solving sparse phase retrieval and sparse principal component analysis (PCA) with convex lifted matrix formulations. The key innovation is a new mixed atomic matrix norm that, when used as regularization, promotes low-rank matrices with sparse factors. We show that convex programs with this atomic norm as a regularizer provide near-optimal sample comple… ▽ More We present novel analysis and algorithms for solving sparse phase retrieval and sparse principal component analysis (PCA) with convex lifted matrix formulations. The key innovation is a new mixed atomic matrix norm that, when used as regularization, promotes low-rank matrices with sparse factors. We show that convex programs with this atomic norm as a regularizer provide near-optimal sample complexity and error rate guarantees for sparse phase retrieval and sparse PCA. While we do not know how to solve the convex programs exactly with an efficient algorithm, for the phase retrieval case we carefully analyze the program and its dual and thereby derive a practical heuristic algorithm. We show empirically that this practical algorithm performs similarly to existing state-of-the-art algorithms. △ Less

Submitted 26 September, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

arXiv:2103.11586 [pdf, other]

Thomson's Multitaper Method Revisited

Authors: Santhosh Karnik, Justin Romberg, Mark A. Davenport

Abstract: Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furtherm… ▽ More Thomson's multitaper method estimates the power spectrum of a signal from $N$ equally spaced samples by averaging $K$ tapered periodograms. Discrete prolate spheroidal sequences (DPSS) are used as tapers since they provide excellent protection against spectral leakage. Thomson's multitaper method is widely used in applications, but most of the existing theory is qualitative or asymptotic. Furthermore, many practitioners use a DPSS bandwidth $W$ and number of tapers that are smaller than what the theory suggests is optimal because the computational requirements increase with the number of tapers. We revisit Thomson's multitaper method from a linear algebra perspective involving subspace projections. This provides additional insight and helps us establish nonasymptotic bounds on some statistical properties of the multitaper spectral estimate, which are similar to existing asymptotic results. We show using $K=2NW-O(\log(NW))$ tapers instead of the traditional $2NW-O(1)$ tapers better protects against spectral leakage, especially when the power spectrum has a high dynamic range. Our perspective also allows us to derive an $ε$-approximation to the multitaper spectral estimate which can be evaluated on a grid of frequencies using $O(\log(NW)\log\tfrac{1}ε)$ FFTs instead of $K=O(NW)$ FFTs. This is useful in problems where many samples are taken, and thus, using many tapers is desirable. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 39 pages, 7 figures

MSC Class: 62-08; 62G05

arXiv:2006.00427 [pdf, other]

Improved bounds for the eigenvalues of prolate spheroidal wave functions and discrete prolate spheroidal sequences

Authors: Santhosh Karnik, Justin Romberg, Mark A. Davenport

Abstract: The discrete prolate spheroidal sequences (DPSSs) are a set of orthonormal sequences in $\ell_2(\mathbb{Z})$ which are strictly bandlimited to a frequency band $[-W,W]$ and maximally concentrated in a time interval $\{0,\ldots,N-1\}$. The timelimited DPSSs (sometimes referred to as the Slepian basis) are an orthonormal set of vectors in $\mathbb{C}^N$ whose discrete time Fourier transform (DTFT) i… ▽ More The discrete prolate spheroidal sequences (DPSSs) are a set of orthonormal sequences in $\ell_2(\mathbb{Z})$ which are strictly bandlimited to a frequency band $[-W,W]$ and maximally concentrated in a time interval $\{0,\ldots,N-1\}$. The timelimited DPSSs (sometimes referred to as the Slepian basis) are an orthonormal set of vectors in $\mathbb{C}^N$ whose discrete time Fourier transform (DTFT) is maximally concentrated in a frequency band $[-W,W]$. Due to these properties, DPSSs have a wide variety of signal processing applications. The DPSSs are the eigensequences of a timelimit-then-bandlimit operator and the Slepian basis vectors are the eigenvectors of the so-called prolate matrix. The eigenvalues in both cases are the same, and they exhibit a particular clustering behavior -- slightly fewer than $2NW$ eigenvalues are very close to $1$, slightly fewer than $N-2NW$ eigenvalues are very close to $0$, and very few eigenvalues are not near $1$ or $0$. This eigenvalue behavior is critical in many of the applications in which DPSSs are used. There are many asymptotic characterizations of the number of eigenvalues not near $0$ or $1$. In contrast, there are very few non-asymptotic results, and these don't fully characterize the clustering behavior of the DPSS eigenvalues. In this work, we establish two novel non-asymptotic bounds on the number of DPSS eigenvalues between $ε$ and $1-ε$. Also, we obtain bounds detailing how close the first $\approx 2NW$ eigenvalues are to $1$ and how close the last $\approx N-2NW$ eigenvalues are to $0$. Furthermore, we extend these results to the eigenvalues of the prolate spheroidal wave functions (PSWFs), which are the continuous-time version of the DPSSs. Finally, we present numerical experiments demonstrating the quality of our non-asymptotic bounds on the number of DPSS eigenvalues between $ε$ and $1-ε$. △ Less

Submitted 28 September, 2020; v1 submitted 30 May, 2020; originally announced June 2020.

Comments: 29 pages, 3 figures. V2 includes new results on the eigenvalues of the prolate spheroidal wave functions (PSWFs). The title has been modified to reflect this

MSC Class: 15B05

arXiv:1907.05325 [pdf, other]

doi 10.1093/imaiai/iaaa020

Low-rank matrix completion and denoising under Poisson noise

Authors: Andrew D. McRae, Mark A. Davenport

Abstract: This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically… ▽ More This paper considers the problem of estimating a low-rank matrix from the observation of all or a subset of its entries in the presence of Poisson noise. When we observe all entries, this is a problem of matrix denoising; when we observe only a subset of the entries, this is a problem of matrix completion. In both cases, we exploit an assumption that the underlying matrix is low-rank. Specifically, we analyze several estimators, including a constrained nuclear-norm minimization program, nuclear-norm regularized least squares, and a nonconvex constrained low-rank optimization problem. We show that for all three estimators, with high probability, we have an upper error bound (in the Frobenius norm error metric) that depends on the matrix rank, the fraction of the elements observed, and maximal row and column sums of the true matrix. We furthermore show that the above results are minimax optimal (within a universal constant) in classes of matrices with low rank and bounded row and column sums. We also extend these results to handle the case of matrix multinomial denoising and completion. △ Less

Submitted 30 April, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

arXiv:1611.04950 [pdf, other]

The Fast Slepian Transform

Authors: Santhosh Karnik, Zhihui Zhu, Michael B. Wakin, Justin Romberg, Mark A. Davenport

Abstract: The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructi… ▽ More The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis - also known as the Slepian basis - this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time-frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods. △ Less

Submitted 10 August, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

arXiv:1209.3672 [pdf, other]

1-Bit Matrix Completion

Authors: Mark A. Davenport, Yaniv Plan, Ewout van den Berg, Mary Wootters

Abstract: In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an… ▽ More In this paper we develop a theory of matrix completion for the extreme case of noisy 1-bit observations. Instead of observing a subset of the real-valued entries of a matrix M, we obtain a small number of binary (1-bit) measurements generated according to a probability distribution determined by the real-valued entries of M. The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when ||M||_{\infty} <= α, and rank(M) <= r. If the log-likelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1-bit measurements, then we can eliminate the requirement that ||M||_{\infty} <= α. For both cases, we provide lower bounds showing that these estimates are near-optimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1-bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better. △ Less

Submitted 1 July, 2014; v1 submitted 17 September, 2012; originally announced September 2012.

arXiv:1111.4422 [pdf, other]

On the stability and accuracy of least squares approximations

Authors: Albert Cohen, Mark A. Davenport, Dany Leviatan

Abstract: We consider the problem of reconstructing an unknown function $f$ on a domain $X$ from samples of $f$ at $n$ randomly chosen points with respect to a given measure $ρ_X$. Given a sequence of linear spaces $(V_m)_{m>0}$ with ${\rm dim}(V_m)=m\leq n$, we study the least squares approximations from the spaces $V_m$. It is well known that such approximations can be inaccurate when $m$ is too close to… ▽ More We consider the problem of reconstructing an unknown function $f$ on a domain $X$ from samples of $f$ at $n$ randomly chosen points with respect to a given measure $ρ_X$. Given a sequence of linear spaces $(V_m)_{m>0}$ with ${\rm dim}(V_m)=m\leq n$, we study the least squares approximations from the spaces $V_m$. It is well known that such approximations can be inaccurate when $m$ is too close to $n$, even when the samples are noiseless. Our main result provides a criterion on $m$ that describes the needed amount of regularization to ensure that the least squares method is stable and that its accuracy, measured in $L^2(X,ρ_X)$, is comparable to the best approximation error of $f$ by elements from $V_m$. We illustrate this criterion for various approximation schemes, such as trigonometric polynomials, with $ρ_X$ being the uniform measure, and algebraic polynomials, with $ρ_X$ being either the uniform or Chebyshev measure. For such examples we also prove similar stability results using deterministic samples that are equispaced with respect to these measures. △ Less

Submitted 15 June, 2018; v1 submitted 18 November, 2011; originally announced November 2011.

arXiv:1104.5246 [pdf, ps, other]

How well can we estimate a sparse vector?

Authors: Emmanuel J. Candès, Mark A. Davenport

Abstract: The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random proj… ▽ More The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved. △ Less

Submitted 1 March, 2013; v1 submitted 27 April, 2011; originally announced April 2011.

arXiv:0911.0736 [pdf, ps, other]

A simple proof that random matrices are democratic

Authors: Mark A. Davenport, Jason N. Laska, Petros T. Boufounos, Richard G. Baraniuk

Abstract: The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement… ▽ More The recently introduced theory of compressive sensing (CS) enables the reconstruction of sparse or compressible signals from a small set of nonadaptive, linear measurements. If properly chosen, the number of measurements can be significantly smaller than the ambient dimension of the signal and yet preserve the significant signal information. Interestingly, it can be shown that random measurement schemes provide a near-optimal encoding in terms of the required number of measurements. In this report, we explore another relatively unexplored, though often alluded to, advantage of using random matrices to acquire CS measurements. Specifically, we show that random matrices are democractic, meaning that each measurement carries roughly the same amount of signal information. We demonstrate that by slightly increasing the number of measurements, the system is robust to the loss of a small number of arbitrary measurements. In addition, we draw connections to oversampling and demonstrate stability from the loss of significantly more measurements. △ Less

Submitted 4 November, 2009; originally announced November 2009.

Report number: Rice University Department of Electrical and Computer Engineering Technical Report TREE0906 MSC Class: 41A46; 68W20; 90C27

arXiv:0909.0083 [pdf, ps, other]

Analysis of Orthogonal Matching Pursuit using the Restricted Isometry Property

Authors: Mark A. Davenport, Michael B. Wakin

Abstract: Orthogonal Matching Pursuit (OMP) is the canonical greedy algorithm for sparse approximation. In this paper we demonstrate that the restricted isometry property (RIP) can be used for a very straightforward analysis of OMP. Our main conclusion is that the RIP of order $K+1$ (with isometry constant $δ< \frac{1}{3\sqrt{K}}$) is sufficient for OMP to exactly recover any $K$-sparse signal. Our analys… ▽ More Orthogonal Matching Pursuit (OMP) is the canonical greedy algorithm for sparse approximation. In this paper we demonstrate that the restricted isometry property (RIP) can be used for a very straightforward analysis of OMP. Our main conclusion is that the RIP of order $K+1$ (with isometry constant $δ< \frac{1}{3\sqrt{K}}$) is sufficient for OMP to exactly recover any $K$-sparse signal. Our analysis relies on simple and intuitive observations about OMP and matrices which satisfy the RIP. For restricted classes of $K$-sparse signals (those that are highly compressible), a relaxed bound on the isometry constant is also established. A deeper understanding of OMP may benefit the analysis of greedy algorithms in general. To demonstrate this, we also briefly revisit the analysis of the Regularized OMP (ROMP) algorithm. △ Less

Submitted 1 September, 2009; originally announced September 2009.

Comments: 11 pages, Submitted to IEEE Transactions on Information Theory

MSC Class: 41A46; 68Q25; 68W20; 90C27

Showing 1–11 of 11 results for author: Davenport, M A