Search | arXiv e-print repository

On the existence of optimal multi-valued decoders and their accuracy bounds for undersampled inverse problems

Authors: Nina Maria Gottschling, Paolo Campodonico, Vegard Antun, Anders C. Hansen

Abstract: Undersampled inverse problems occur everywhere in the sciences including medical imaging, radar, astronomy etc., yielding underdetermined linear or non-linear reconstruction problems. There are now a myriad of techniques to design decoders that can tackle such problems, ranging from optimization based approaches, such as compressed sensing, to deep learning (DL), and variants in between the two te… ▽ More Undersampled inverse problems occur everywhere in the sciences including medical imaging, radar, astronomy etc., yielding underdetermined linear or non-linear reconstruction problems. There are now a myriad of techniques to design decoders that can tackle such problems, ranging from optimization based approaches, such as compressed sensing, to deep learning (DL), and variants in between the two techniques. The variety of methods begs for a unifying approach to determine the existence of optimal decoders and fundamental accuracy bounds, in order to facilitate a theoretical and empirical understanding of the performance of existing and future methods. Such a theory must allow for both single-valued and multi-valued decoders, as underdetermined inverse problems typically have multiple solutions. Indeed, multi-valued decoders arise due to non-uniqueness of minimizers in optimisation problems, such as in compressed sensing, and for DL based decoders in generative adversarial models, such as diffusion models and ensemble models. In this work we provide a framework for assessing the lowest possible reconstruction accuracy in terms of worst- and average-case errors. The universal bounds bounds only depend on the measurement model $F$, the model class $\mathcal{M}_1 \subseteq \mathcal{X}$ and the noise model $\mathcal{E}$. For linear $F$ these bounds depend on its kernel, and in the non-linear case the concept of kernel is generalized for undersampled settings. Additionally, we provide multi-valued variational solutions that obtain the lowest possible reconstruction error. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2307.07410 [pdf, other]

Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

Authors: Johan S. Wind, Vegard Antun, Anders C. Hansen

Abstract: Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenome… ▽ More Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN. △ Less

Submitted 13 July, 2023; originally announced July 2023.

MSC Class: 90C25; 68T07; 90C17 (Primary) 15A29; 94A08; 46N10 (Secondary)

arXiv:2106.00554 [pdf, other]

Recovering wavelet coefficients from binary samples using fast transforms

Authors: Vegard Antun

Abstract: Recovering a signal (function) from finitely many binary or Fourier samples is one of the core problems in modern medical imaging, and by now there exist a plethora of methods for recovering a signal from such samples. Examples of methods, which can utilise wavelet reconstruction, include generalised sampling, infinite-dimensional compressive sensing, the parameterised-background data-weak (PBDW)… ▽ More Recovering a signal (function) from finitely many binary or Fourier samples is one of the core problems in modern medical imaging, and by now there exist a plethora of methods for recovering a signal from such samples. Examples of methods, which can utilise wavelet reconstruction, include generalised sampling, infinite-dimensional compressive sensing, the parameterised-background data-weak (PBDW) method etc. However, for any of these methods to be applied in practice, accurate and fast modelling of an $N \times M$ section of the infinite-dimensional change-of-basis matrix between the sampling basis (Fourier or Walsh-Hadamard samples) and the wavelet reconstruction basis is paramount. In this work, we derive an algorithm, which bypasses the $NM$ storage requirement and the $\mathcal{O}(NM)$ computational cost of matrix-vector multiplication with this matrix when using Walsh-Hadamard samples and wavelet reconstruction. The proposed algorithm computes the matrix-vector multiplication in $\mathcal{O}(N\log N)$ operations and has a storage requirement of $\mathcal{O}(2^q)$, where $N=2^{dq} M$, (usually $q \in \{1,2\}$) and $d=1,2$ is the dimension. As matrix-vector multiplications is the computational bottleneck for iterative algorithms used by the mentioned reconstruction methods, the proposed algorithm speeds up the reconstruction of wavelet coefficients from Walsh-Hadamard samples considerably. △ Less

Submitted 1 June, 2021; originally announced June 2021.

MSC Class: 94A20; 94A11; 42C10; 42C40; 46C05

arXiv:2101.08286 [pdf, other]

doi 10.1073/pnas.2107151119

Can stable and accurate neural networks be computed? -- On the barriers of deep learning and Smale's 18th problem

Authors: Matthew J. Colbrook, Vegard Antun, Anders C. Hansen

Abstract: Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, current DL methods typically suffer from instability, even when universal approximation properties guarantee the existence of stable neural networks (NNs). We address this paradox by demonstrating basic well-conditioned problems in scientific computing where one can prove the existen… ▽ More Deep learning (DL) has had unprecedented success and is now entering scientific computing with full force. However, current DL methods typically suffer from instability, even when universal approximation properties guarantee the existence of stable neural networks (NNs). We address this paradox by demonstrating basic well-conditioned problems in scientific computing where one can prove the existence of NNs with great approximation qualities, however, there does not exist any algorithm, even randomised, that can train (or compute) such a NN. For any positive integers $K > 2$ and $L$, there are cases where simultaneously: (a) no randomised training algorithm can compute a NN correct to $K$ digits with probability greater than $1/2$, (b) there exists a deterministic training algorithm that computes a NN with $K-1$ correct digits, but any such (even randomised) algorithm needs arbitrarily many training data, (c) there exists a deterministic training algorithm that computes a NN with $K-2$ correct digits using no more than $L$ training samples. These results imply a classification theory describing conditions under which (stable) NNs with a given accuracy can be computed by an algorithm. We begin this theory by establishing sufficient conditions for the existence of algorithms that compute stable NNs in inverse problems. We introduce Fast Iterative REstarted NETworks (FIRENETs), which we both prove and numerically verify are stable. Moreover, we prove that only $\mathcal{O}(|\log(ε)|)$ layers are needed for an $ε$-accurate solution to the inverse problem. △ Less

Submitted 15 April, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: 14 pages + SI Appendix

Journal ref: Proc. Natl. Acad. Sci. USA, 2022

Showing 1–4 of 4 results for author: Antun, V