Search | arXiv e-print repository

On best approximation by multivariate ridge functions with applications to generalized translation networks

Authors: Paul Geuchen, Palina Salanevich, Olov Schavemaker, Felix Voigtlaender

Abstract: We prove sharp upper and lower bounds for the approximation of Sobolev functions by sums of multivariate ridge functions, i.e., functions of the form $\mathbb{R}^d \ni x \mapsto \sum_{k=1}^n h_k(A_k x) \in \mathbb{R}$ with $h_k : \mathbb{R}^\ell \to \mathbb{R}$ and $A_k \in \mathbb{R}^{\ell \times d}$. We show that the order of approximation asymptotically behaves as $n^{-r/(d-\ell)}$, where $r$ i… ▽ More We prove sharp upper and lower bounds for the approximation of Sobolev functions by sums of multivariate ridge functions, i.e., functions of the form $\mathbb{R}^d \ni x \mapsto \sum_{k=1}^n h_k(A_k x) \in \mathbb{R}$ with $h_k : \mathbb{R}^\ell \to \mathbb{R}$ and $A_k \in \mathbb{R}^{\ell \times d}$. We show that the order of approximation asymptotically behaves as $n^{-r/(d-\ell)}$, where $r$ is the regularity of the Sobolev functions to be approximated. Our lower bound even holds when approximating $L^\infty$-Sobolev functions of regularity $r$ with error measured in $L^1$, while our upper bound applies to the approximation of $L^p$-Sobolev functions in $L^p$ for any $1 \leq p \leq \infty$. These bounds generalize well-known results about the approximation properties of univariate ridge functions to the multivariate case. Moreover, we use these bounds to obtain sharp asymptotic bounds for the approximation of Sobolev functions using generalized translation networks and complex-valued neural networks. △ Less

Submitted 27 March, 2025; v1 submitted 11 December, 2024; originally announced December 2024.

MSC Class: 41A30; 41A25; 41A63; 46E35; 68T07

arXiv:2411.18155 [pdf, ps, other]

Besov regularity of random wavelet series

Authors: Andreas Horst, Thomas Jahn, Felix Voigtlaender

Abstract: We study the Besov regularity of wavelet series on $\mathbb{R}^d$ with randomly chosen coefficients. More precisely, each coefficient is a product of a random factor and a parameterized deterministic factor (decaying with the scale $j$ and the norm of the shift $m$). Compared to the literature, we impose relatively mild conditions on the moments of the random variables in order to characterize the… ▽ More We study the Besov regularity of wavelet series on $\mathbb{R}^d$ with randomly chosen coefficients. More precisely, each coefficient is a product of a random factor and a parameterized deterministic factor (decaying with the scale $j$ and the norm of the shift $m$). Compared to the literature, we impose relatively mild conditions on the moments of the random variables in order to characterize the almost sure convergence of the wavelet series in Besov spaces $B^s_{p,q}(\mathbb{R}^d)$ and the finiteness of the moments as well as of the moment generating function of the Besov norm. In most cases, we achieve a complete characterization, i.e., the derived conditions are both necessary and sufficient. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2411.08416 [pdf, ps, other]

On wavelet coorbit spaces associated to different dilation groups

Authors: Hartmut Führ, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: This paper develops methods based on coarse geometry for the comparison of wavelet coorbit spaces defined by different dilation groups, with emphasis on establishing a unified approach to both irreducible and reducible quasi-regular representations. We show that the use of reducible representations is essential to include a variety of examples, such as anisotropic Besov spaces defined by general e… ▽ More This paper develops methods based on coarse geometry for the comparison of wavelet coorbit spaces defined by different dilation groups, with emphasis on establishing a unified approach to both irreducible and reducible quasi-regular representations. We show that the use of reducible representations is essential to include a variety of examples, such as anisotropic Besov spaces defined by general expansive matrices, in a common framework. The obtained criteria yield, among others, a simple characterization of subgroups of a dilation group yielding the same coorbit spaces. They also allow to clarify which anisotropic Besov spaces have an alternative description as coorbit spaces associated to irreducible quasi-regular representations. △ Less

Submitted 13 November, 2024; originally announced November 2024.

Comments: To appear in a special volume dedicated to K. Gröchenig

arXiv:2409.01849 [pdf, ps, other]

Discrete Triebel-Lizorkin spaces and expansive matrices

Authors: Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: We provide a characterization of two expansive dilation matrices yielding equal discrete anisotropic Triebel-Lizorkin spaces. For two such matrices $A$ and $B$, it is shown that $\dot{\mathbf{f}}^α_{p,q}(A) = \dot{\mathbf{f}}^α_{p,q}(B)$ for all $α\in \mathbb{R}$ and $p, q \in (0, \infty]$ if and only if the set $\{A^j B^{-j} : j \in \mathbb{Z}\}$ is finite, or in the trivial case when $p = q$ and… ▽ More We provide a characterization of two expansive dilation matrices yielding equal discrete anisotropic Triebel-Lizorkin spaces. For two such matrices $A$ and $B$, it is shown that $\dot{\mathbf{f}}^α_{p,q}(A) = \dot{\mathbf{f}}^α_{p,q}(B)$ for all $α\in \mathbb{R}$ and $p, q \in (0, \infty]$ if and only if the set $\{A^j B^{-j} : j \in \mathbb{Z}\}$ is finite, or in the trivial case when $p = q$ and $|\det(A)|^{α+ 1/2 - 1/p} = |\det(B)|^{α+ 1/2 - 1/p}$. This provides an extension of a result by Triebel for diagonal dilations to arbitrary expansive matrices. The obtained classification of dilations is different from corresponding results for anisotropic Triebel-Lizorkin function spaces. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2407.16595 [pdf, ps, other]

Smoothness spaces for warped time-frequency representations -- Decomposition spaces and embedding relations

Authors: Nicki Holighaus, Felix Voigtlaender

Abstract: In a recent paper, we have shown that warped time-frequency representations provide a rich framework for the construction and study of smoothness spaces matched to very general phase space geometries obtained by diffeomorphic deformations of $\mathbb{R}^d$. Here, we study these spaces, obtained through the application of general coorbit theory, using the framework of decomposition spaces. This all… ▽ More In a recent paper, we have shown that warped time-frequency representations provide a rich framework for the construction and study of smoothness spaces matched to very general phase space geometries obtained by diffeomorphic deformations of $\mathbb{R}^d$. Here, we study these spaces, obtained through the application of general coorbit theory, using the framework of decomposition spaces. This allows us to derive embedding relations between coorbit spaces associated to different warping functions, and relate them to established, important smothness spaces. In particular, we show that we obtain $α$-modulation spaces and spaces of dominating mixed smoothness as special cases and, in contrast, that this is only possible for Besov spaces if $d=1$. △ Less

Submitted 23 July, 2024; originally announced July 2024.

MSC Class: 42B35; 46E35; 46F05; 46F12

arXiv:2311.07368 [pdf, ps, other]

Classification of anisotropic local Hardy spaces and inhomogeneous Triebel-Lizorkin spaces

Authors: Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: This paper provides a characterization of when two expansive matrices yield the same anisotropic local Hardy and inhomogeneous Triebel-Lizorkin spaces. The characterization is in terms of the coarse equivalence of certain quasi-norms associated to the matrices. For nondiagonal matrices, these conditions are strictly weaker than those classifying the coincidence of the corresponding homogeneous fun… ▽ More This paper provides a characterization of when two expansive matrices yield the same anisotropic local Hardy and inhomogeneous Triebel-Lizorkin spaces. The characterization is in terms of the coarse equivalence of certain quasi-norms associated to the matrices. For nondiagonal matrices, these conditions are strictly weaker than those classifying the coincidence of the corresponding homogeneous function spaces. The obtained results complete the classification of anisotropic Besov and Triebel-Lizorkin spaces associated to general expansive matrices. △ Less

Submitted 27 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.01356 [pdf, other]

Upper and lower bounds for the Lipschitz constant of random neural networks

Authors: Paul Geuchen, Thomas Heindl, Dominik Stöger, Felix Voigtlaender

Abstract: Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assu… ▽ More Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. In this paper, we study upper and lower bounds for the Lipschitz constant of random ReLU neural networks. Specifically, we assume that the weights and biases follow a generalization of the He initialization, where general symmetric distributions for the biases are permitted. For shallow neural networks, we characterize the Lipschitz constant up to an absolute numerical constant. For deep networks with fixed depth and sufficiently large width, our established upper bound is larger than the lower bound by a factor that is logarithmic in the width. △ Less

Submitted 18 January, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

MSC Class: 68T07; 26A16; 60B20; 60G15

arXiv:2303.16813 [pdf, other]

Optimal approximation using complex-valued neural networks

Authors: Paul Geuchen, Felix Voigtlaender

Abstract: Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as in MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundatio… ▽ More Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as in MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundation is still largely lacking in the complex-valued case. We thus analyze the expressivity of CVNNs by studying their approximation properties. Our results yield the first quantitative approximation bounds for CVNNs that apply to a wide class of activation functions including the popular modReLU and complex cardioid activation functions. Precisely, our results apply to any activation function that is smooth but not polyharmonic on some non-empty open set; this is the natural generalization of the class of smooth and non-polynomial activation functions to the complex setting. Our main result shows that the error for the approximation of $C^k$-functions scales as $m^{-k/(2n)}$ for $m \to \infty$ where $m$ is the number of neurons, $k$ the smoothness of the target function and $n$ is the (complex) input dimension. Under a natural continuity assumption, we show that this rate is optimal; we further discuss the optimality when dropping this assumption. Moreover, we prove that the problem of approximating $C^k$-functions using continuous approximation methods unavoidably suffers from the curse of dimensionality. △ Less

Submitted 30 October, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: accepted at NeurIPS 2023

MSC Class: 68T07; 41A30; 41A63; 31A30; 30E10

arXiv:2212.00445 [pdf, ps, other]

Sampling numbers of smoothness classes via $\ell^1$-minimization

Authors: Thomas Jahn, Tino Ullrich, Felix Voigtlaender

Abstract: Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric… ▽ More Using techniques developed recently in the field of compressed sensing we prove new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in $L^2$. In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in $L^2$ can be upper bounded by best $n$-term trigonometric widths in $L^\infty$. We describe a recovery procedure from $m$ function values based on $\ell^1$-minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of $m^{-1/2}$ (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging to $S^r_pW(\mathbb{T}^d)$ on the $d$-torus with a logarithmically better rate of convergence than any linear method can achieve when $1 < p < 2$ and $d$ is large. This effect is not present for isotropic Sobolev spaces. △ Less

Submitted 31 July, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.04936 [pdf, ps, other]

Classification of anisotropic Triebel-Lizorkin spaces

Authors: Sarah Koppensteiner, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: This paper provides a classification theorem for expansive matrices $A \in \mathrm{GL}(d, \mathbb{R})$ generating the same anisotropic homogeneous Triebel-Lizorkin space $\dot{\mathbf{F}}^α_{p, q}(A)$ for $α\in \mathbb{R}$ and $p,q \in (0,\infty]$. It is shown that $\dot{\mathbf{F}}^α_{p, q}(A) = \dot{\mathbf{F}}^α_{p, q}(B)$ if and only if the homogeneous quasi-norms $ρ_A, ρ_B$ associated to the… ▽ More This paper provides a classification theorem for expansive matrices $A \in \mathrm{GL}(d, \mathbb{R})$ generating the same anisotropic homogeneous Triebel-Lizorkin space $\dot{\mathbf{F}}^α_{p, q}(A)$ for $α\in \mathbb{R}$ and $p,q \in (0,\infty]$. It is shown that $\dot{\mathbf{F}}^α_{p, q}(A) = \dot{\mathbf{F}}^α_{p, q}(B)$ if and only if the homogeneous quasi-norms $ρ_A, ρ_B$ associated to the matrices $A, B$ are equivalent, except for the case $\dot{\mathbf{F}}^0_{p, 2} = L^p$ with $p \in (1,\infty)$. The obtained results complement and extend the classification of anisotropic Hardy spaces $H^p(A) = \dot{\mathbf{F}}^{0}_{p,2}(A)$, $p \in (0,1]$, in [Mem. Am. Math. Soc. 781, 122 p. (2003)]. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2208.07605 [pdf, ps, other]

$L^p$ sampling numbers for the Fourier-analytic Barron space

Authors: Felix Voigtlaender

Abstract: In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ> 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(ξ) \, e^{2 πi \langle x, ξ\rangle} \, d ξ \quad \text{with} \quad \int_{\mathbb{R}^d} |F(ξ)| \cdot (1 + |ξ|)^σ \, d ξ< \infty. \] For $σ= 1$, these functions play a prominent role in machine learning, since they can be effic… ▽ More In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ> 0$, which are functions that can be written as \[ f(x) = \int_{\mathbb{R}^d} F(ξ) \, e^{2 πi \langle x, ξ\rangle} \, d ξ \quad \text{with} \quad \int_{\mathbb{R}^d} |F(ξ)| \cdot (1 + |ξ|)^σ \, d ξ< \infty. \] For $σ= 1$, these functions play a prominent role in machine learning, since they can be efficiently approximated by (shallow) neural networks without suffering from the curse of dimensionality. For these functions, we study the following question: Given $m$ point samples $f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to \mathbb{R}$ of smoothness $σ$, how well can $f$ be recovered from these samples, for an optimal choice of the sampling points and the reconstruction procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m (σ; L^p)$, we show that \[ m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} \lesssim s_m(σ;L^p) \lesssim (\ln (e + m))^{α(σ,d) / p} \cdot m^{- \frac{1}{\max \{ p,2 \}} - \fracσ{d}} , \] where the implied constants only depend on $σ$ and $d$ and where $α(σ,d)$ stays bounded as $d \to \infty$. △ Less

Submitted 16 August, 2022; originally announced August 2022.

MSC Class: 94A20; 41A46; 46E15; 42B35; 41A25; 65D15; 41A63

arXiv:2208.01342 [pdf, ps, other]

Coorbit theory of warped time-frequency systems in $\mathbb{R}^d$

Authors: Nicki Holighaus, Felix Voigtlaender

Abstract: Warped time-frequency systems have recently been introduced as a class of structured continuous frames for functions on the real line. Herein, we generalize this framework to the setting of functions of arbitrary dimensionality. After showing that the basic properties of warped time-frequency representations carry over to higher dimensions, we determine conditions on the warping function which gua… ▽ More Warped time-frequency systems have recently been introduced as a class of structured continuous frames for functions on the real line. Herein, we generalize this framework to the setting of functions of arbitrary dimensionality. After showing that the basic properties of warped time-frequency representations carry over to higher dimensions, we determine conditions on the warping function which guarantee that the associated Gramian is well-localized, so that associated families of coorbit spaces can be constructed. We then show that discrete Banach frame decompositions for these coorbit spaces can be obtained by sampling the continuous warped time-frequency systems. In particular, this implies that sparsity of a given function $f$ in the discrete warped time-frequency dictionary is equivalent to membership of $f$ in the coorbit space. We put special emphasis on the case of radial warping functions, for which the relevant assumptions simplify considerably. △ Less

Submitted 24 April, 2024; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: Revised version

MSC Class: 42B35; 42C15; 46F05; 46F12; 94A20

arXiv:2205.13531 [pdf, other]

Learning ReLU networks to high uniform accuracy is intractable

Authors: Julius Berner, Philipp Grohs, Felix Voigtlaender

Abstract: Statistical learning theory provides bounds on the necessary number of training samples needed to reach a prescribed accuracy in a learning problem formulated over a given target class. This accuracy is typically measured in terms of a generalization error, that is, an expected value of a given loss function. However, for several applications -- for example in a security-critical context or for pr… ▽ More Statistical learning theory provides bounds on the necessary number of training samples needed to reach a prescribed accuracy in a learning problem formulated over a given target class. This accuracy is typically measured in terms of a generalization error, that is, an expected value of a given loss function. However, for several applications -- for example in a security-critical context or for problems in the computational sciences -- accuracy in this sense is not sufficient. In such cases, one would like to have guarantees for high accuracy on every input value, that is, with respect to the uniform norm. In this paper we precisely quantify the number of training samples needed for any conceivable training algorithm to guarantee a given uniform accuracy on any learning problem formulated over target classes containing (or consisting of) ReLU neural networks of a prescribed architecture. We prove that, under very general assumptions, the minimal number of training samples for this task scales exponentially both in the depth and the input dimension of the network architecture. △ Less

Submitted 28 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Accepted at ICLR 2023

arXiv:2204.10110 [pdf, ps, other]

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, II

Authors: Sarah Koppensteiner, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: Continuing previous work, this paper provides maximal characterizations of anisotropic Triebel-Lizorkin spaces $\dot{\mathbf{F}}^α_{p,q}$ for the endpoint case of $p = \infty$ and the full scale of parameters $α\in \mathbb{R}$ and $q \in (0,\infty]$. In particular, a Peetre-type characterization of the anisotropic Besov space… ▽ More Continuing previous work, this paper provides maximal characterizations of anisotropic Triebel-Lizorkin spaces $\dot{\mathbf{F}}^α_{p,q}$ for the endpoint case of $p = \infty$ and the full scale of parameters $α\in \mathbb{R}$ and $q \in (0,\infty]$. In particular, a Peetre-type characterization of the anisotropic Besov space $\dot{\mathbf{B}}^α_{\infty,\infty} = \dot{\mathbf{F}}^α_{\infty,\infty}$ is obtained. As a consequence, it is shown that there exist dual molecular frames and Riesz sequences in $\dot{\mathbf{F}}^α_{\infty,q}$. △ Less

Submitted 18 January, 2023; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2203.07959 [pdf, ps, other]

Coorbit spaces associated to quasi-Banach function spaces and their molecular decomposition

Authors: Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: This paper provides a self-contained exposition of coorbit spaces associated to integrable group representations and quasi-Banach function spaces, and at the same time extends and simplifies previous work. The main results provide an extension of the theory in [Studia Math., 180(3):237-253, 2007] from groups admitting a compact, conjugation-invariant unit neighborhood to arbitrary (possibly nonuni… ▽ More This paper provides a self-contained exposition of coorbit spaces associated to integrable group representations and quasi-Banach function spaces, and at the same time extends and simplifies previous work. The main results provide an extension of the theory in [Studia Math., 180(3):237-253, 2007] from groups admitting a compact, conjugation-invariant unit neighborhood to arbitrary (possibly nonunimodular) locally compact groups. In addition, the present paper establishes the existence of molecular dual frames and Riesz sequences as in [J. Funct. Anal., 280(10):56, 2021] for the full scale of quasi-Banach function spaces. The theory is developed for possibly projective and reducible unitary representations in order to be easily applicable to well-studied function spaces not satisfying the classical assumptions of coorbit theory. Compared to the existing literature on quasi-Banach coorbit spaces, all our results apply under significantly weaker integrability conditions on the analyzing vectors, which allows for obtaining sharp results in concrete settings △ Less

Submitted 23 February, 2024; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: To appear in Mémoires de la Société Mathématique de France

arXiv:2201.11705 [pdf, ps, other]

A fractal uncertainty principle for Bergman spaces and analytic wavelets

Authors: Luis Daniel Abreu, Zouhair Mouayn, Felix Voigtlaender

Abstract: Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and by similar results of Knutsen for joint time-frequency representations (i.e., the short-time Fourier transform (STFT) with a Gaussian window, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in B… ▽ More Motivated by results of Dyatlov on Fourier uncertainty principles for Cantor sets and by similar results of Knutsen for joint time-frequency representations (i.e., the short-time Fourier transform (STFT) with a Gaussian window, equivalent to Fock spaces), we suggest a general setting relating localization and uncertainty and prove, within this context, an uncertainty principle for Cantor sets in Bergman spaces on the unit disk, where the Cantor set is defined as a union of annuli that are equidistributed in the hyperbolic measure.The result can be written in terms of analytic Cauchy wavelets. As in the case of the STFT considered by Knutsen, our result consists of a two-sided bound for the norm of a localization operator involving the fractal dimension log 2 / log 3 in the exponent. As in the STFT case and in Dyatlov fractal uncertainty principle, the (hyperbolic) measure of the dilated iterates of the Cantor set in the disk tends to infinity, while the corresponding norm of the localization operator tends to zero. △ Less

Submitted 30 August, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 18

arXiv:2112.12555 [pdf, ps, other]

Optimal learning of high-dimensional classification problems using deep neural networks

Authors: Philipp Petersen, Felix Voigtlaender

Abstract: We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essent… ▽ More We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions. △ Less

Submitted 24 December, 2021; v1 submitted 23 December, 2021; originally announced December 2021.

MSC Class: 68T05; 62C20; 41A25; 41A46

arXiv:2110.15304 [pdf, ps, other]

Sobolev-type embeddings for neural network approximation spaces

Authors: Philipp Grohs, Felix Voigtlaender

Abstract: We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the magnitude of the coefficients and the number of hidden layers. We prove embedding theorems between these spaces for different values of $p$. Furtherm… ▽ More We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated (with error measured in $L^p$) by ReLU neural networks with an increasing number of coefficients, subject to bounds on the magnitude of the coefficients and the number of hidden layers. We prove embedding theorems between these spaces for different values of $p$. Furthermore, we derive sharp embeddings of these approximation spaces into Hölder spaces. We find that, analogous to the case of classical function spaces (such as Sobolev spaces, or Besov spaces) it is possible to trade "smoothness" (i.e., approximation rate) for increased integrability. Combined with our earlier results in [arXiv:2104.02746], our embedding theorems imply a somewhat surprising fact related to "learning" functions from a given neural network space based on point samples: if accuracy is measured with respect to the uniform norm, then an optimal "learning" algorithm for reconstructing functions that are well approximable by ReLU neural networks is simply given by piecewise constant interpolation on a tensor product grid. △ Less

Submitted 28 October, 2021; originally announced October 2021.

MSC Class: 68T07; 46E35; 65D05; 46E30

arXiv:2106.02365 [pdf, ps, other]

A note on the invertibility of the Gabor frame operator on certain modulation spaces

Authors: Dae Gwan Lee, Friedrich Philipp, Felix Voigtlaender

Abstract: We consider Gabor frames generated by a general lattice and a window function that belongs to one of the following spaces: the Sobolev space $V_1 = H^1(\mathbb R^d)$, the weighted $L^2$-space $V_2 = L_{1 + |x|}^2(\mathbb R^d)$, and the space $V_3 = \mathbb H^1(\mathbb R^d) = V_1 \cap V_2$ consisting of all functions with finite uncertainty product; all these spaces can be described as modulation s… ▽ More We consider Gabor frames generated by a general lattice and a window function that belongs to one of the following spaces: the Sobolev space $V_1 = H^1(\mathbb R^d)$, the weighted $L^2$-space $V_2 = L_{1 + |x|}^2(\mathbb R^d)$, and the space $V_3 = \mathbb H^1(\mathbb R^d) = V_1 \cap V_2$ consisting of all functions with finite uncertainty product; all these spaces can be described as modulation spaces with respect to suitable weighted $L^2$ spaces. In all cases, we prove that the space of Bessel vectors in $V_j$ is mapped bijectively onto itself by the Gabor frame operator. As a consequence, if the window function belongs to one of the three spaces, then the canonical dual window also belongs to the same space. In fact, the result not only applies to frames, but also to frame sequences. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: 17 pages

MSC Class: 42C15; 42C40; 46E35; 46B15

arXiv:2104.14361 [pdf, ps, other]

Anisotropic Triebel-Lizorkin spaces and wavelet coefficient decay over one-parameter dilation groups, I

Authors: Sarah Koppensteiner, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: This paper provides maximal function characterizations of anisotropic Triebel-Lizorkin spaces associated to general expansive matrices for the full range of parameters $p \in (0,\infty)$, $q \in (0,\infty]$ and $α\in \mathbb{R}$. The equivalent norm is defined in terms of the decay of wavelet coefficients, quantified by a Peetre-type space over a one-parameter dilation group. As an application, th… ▽ More This paper provides maximal function characterizations of anisotropic Triebel-Lizorkin spaces associated to general expansive matrices for the full range of parameters $p \in (0,\infty)$, $q \in (0,\infty]$ and $α\in \mathbb{R}$. The equivalent norm is defined in terms of the decay of wavelet coefficients, quantified by a Peetre-type space over a one-parameter dilation group. As an application, the existence of dual molecular frames and Riesz sequences is obtained; the wavelet systems are generated by translations and anisotropic dilations of a single function, where neither the translation nor dilation parameters are required to belong to a discrete subgroup. Explicit criteria for molecules are given in terms of mild decay, moment, and smoothness conditions. △ Less

Submitted 18 January, 2023; v1 submitted 29 April, 2021; originally announced April 2021.

arXiv:2104.02746 [pdf, other]

Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces

Authors: Philipp Grohs, Felix Voigtlaender

Abstract: We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question… ▽ More We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates by such algorithms. We answer this question in the negative by proving hardness results for the problems of approximation and integration on a novel class of neural network approximation spaces. In particular, our results confirm a conjectured and empirically observed theory-to-practice gap in deep learning. We complement our hardness results by showing that approximation rates of a comparable order of convergence are (at least theoretically) achievable. △ Less

Submitted 6 April, 2021; originally announced April 2021.

MSC Class: 41A46; 68T07; 41A65; 41A25; 68T05; 65Y20

arXiv:2102.13092 [pdf, other]

Quantitative approximation results for complex-valued neural networks

Authors: A. Caragea, D. G. Lee, J. Maly, G. Pfander, F. Voigtlaender

Abstract: Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached s… ▽ More Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached some level of maturity, this is far from true for complex-valued networks. In this paper, we analyze the expressivity of complex-valued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $\mathbb{C}^d$ by complex-valued neural networks that employ the modReLU activation function, given by $σ(z) = \mathrm{ReLU}(|z| - 1) \, \mathrm{sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth. △ Less

Submitted 3 December, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

MSC Class: 68T07; 41A25; 41A46

arXiv:2012.03351 [pdf, ps, other]

The universal approximation theorem for complex-valued neural networks

Authors: Felix Voigtlaender

Abstract: We generalize the classical universal approximation theorem for neural networks to the case of complex-valued neural networks. Precisely, we consider feedforward networks with a complex activation function $σ: \mathbb{C} \to \mathbb{C}$ in which each neuron performs the operation $\mathbb{C}^N \to \mathbb{C}, z \mapsto σ(b + w^T z)$ with weights $w \in \mathbb{C}^N$ and a bias $b \in \mathbb{C}$,… ▽ More We generalize the classical universal approximation theorem for neural networks to the case of complex-valued neural networks. Precisely, we consider feedforward networks with a complex activation function $σ: \mathbb{C} \to \mathbb{C}$ in which each neuron performs the operation $\mathbb{C}^N \to \mathbb{C}, z \mapsto σ(b + w^T z)$ with weights $w \in \mathbb{C}^N$ and a bias $b \in \mathbb{C}$, and with $σ$ applied componentwise. We completely characterize those activation functions $σ$ for which the associated complex networks have the universal approximation property, meaning that they can uniformly approximate any continuous function on any compact subset of $\mathbb{C}^d$ arbitrarily well. Unlike the classical case of real networks, the set of "good activation functions" which give rise to networks with the universal approximation property differs significantly depending on whether one considers deep networks or shallow networks: For deep networks with at least two hidden layers, the universal approximation property holds as long as $σ$ is neither a polynomial, a holomorphic function, or an antiholomorphic function. Shallow networks, on the other hand, are universal if and only if the real part or the imaginary part of $σ$ is not a polyharmonic function. △ Less

Submitted 11 December, 2022; v1 submitted 6 December, 2020; originally announced December 2020.

MSC Class: 68T07; 41A30; 41A63; 31A30; 30E10

arXiv:2011.09363 [pdf, other]

Neural network approximation and estimation of classifiers with classification boundary in a Barron class

Authors: Andrei Caragea, Philipp Petersen, Felix Voigtlaender

Abstract: We prove bounds for the approximation and estimation of certain binary classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates are independent of the dimension of the inpu… ▽ More We prove bounds for the approximation and estimation of certain binary classification functions using ReLU neural networks. Our estimation bounds provide a priori performance guarantees for empirical risk minimization using networks of a suitable size, depending on the number of training samples available. The obtained approximation and estimation rates are independent of the dimension of the input, showing that the curse of dimensionality can be overcome in this setting; in fact, the input dimension only enters in the form of a polynomial factor. Regarding the regularity of the target classification function, we assume the interfaces between the different classes to be locally of Barron-type. We complement our results by studying the relations between various Barron-type spaces that have been proposed in the literature. These spaces differ substantially more from each other than the current literature suggests. △ Less

Submitted 10 March, 2022; v1 submitted 18 November, 2020; originally announced November 2020.

MSC Class: 68T07; 41A25; 41A46; 42B35; 46E15

arXiv:2008.01011 [pdf, other]

Phase Transitions in Rate Distortion Theory and Deep Learning

Authors: Philipp Grohs, Andreas Klotz, Felix Voigtlaender

Abstract: Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of… ▽ More Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of $\mathcal{S}$ that are compressed at a higher rate than $s^\ast(\mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes $\mathcal{S}$, a phase transition occurs: We construct a probability measure $\mathbb{P}$ on $\mathcal{S}$ such that for every coding scheme $\mathcal{C}$ and any $s >s^\ast(\mathcal{S})$, the set of signals encoded with error $\mathcal{O}(R^{-s})$ by $\mathcal{C}$ forms a $\mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Ω)$ for a bounded Lipschitz domain $Ω$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $f\in\mathcal{S}$ can be encoded to within accuracy $\varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $f\in\mathcal{S}$ to within accuracy $\varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any $s >s^\ast(\mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by $\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}$. △ Less

Submitted 3 August, 2020; originally announced August 2020.

MSC Class: 41A46; 28C20; 68P30

arXiv:2006.01083 [pdf, ps, other]

Schur-type Banach modules of integral kernels acting on mixed-norm Lebesgue spaces

Authors: Nicki Holighaus, Felix Voigtlaender

Abstract: Schur's test states that if $K:X\times Y\to\mathbb{C}$ satisfies $\int_Y |K(x,y)|dν(y)\leq C$ and $\int_X |K(x,y)|dμ(x)\leq C$, then the associated integral operator acts boundedly on $L^p$ for all $p\in [1,\infty]$. We derive a variant of this result ensuring boundedness on the (weighted) mixed-norm Lebesgue spaces $L_w^{p,q}$ for all $p,q\in [1,\infty]$. For non-negative integral kernels our cri… ▽ More Schur's test states that if $K:X\times Y\to\mathbb{C}$ satisfies $\int_Y |K(x,y)|dν(y)\leq C$ and $\int_X |K(x,y)|dμ(x)\leq C$, then the associated integral operator acts boundedly on $L^p$ for all $p\in [1,\infty]$. We derive a variant of this result ensuring boundedness on the (weighted) mixed-norm Lebesgue spaces $L_w^{p,q}$ for all $p,q\in [1,\infty]$. For non-negative integral kernels our criterion is sharp; i.e., it is satisfied if and only if the integral operator acts boundedly on all of the mixed-norm Lebesgue spaces. Motivated by this criterion, we introduce solid Banach modules $\mathcal{B}_m(X,Y)$ of integral kernels such that all kernels in $\mathcal{B}_m(X,Y)$ map $L_w^{p,q}(ν)$ boundedly into $L_v^{p,q}(μ)$ for all $p,q \in [1,\infty]$, provided that the weights $v,w$ are $m$-moderate. Conversely, if $\mathbf{A}$ and $\mathbf{B}$ are solid Banach spaces for which all kernels $K\in\mathcal{B}_m(X,Y)$ map $\mathbf{A}$ into $\mathbf{B}$, then $\mathbf{A}$ and $\mathbf{B}$ are related to mixed-norm Lebesgue-spaces; i.e., $\left(L^1\cap L^\infty\cap L^{1,\infty}\cap L^{\infty,1}\right)_v\hookrightarrow\mathbf{B}$ and $\mathbf{A}\hookrightarrow\left(L^1 + L^\infty + L^{1,\infty} + L^{\infty,1}\right)_{1/w}$ for certain weights $v,w$ depending on the weight $m$. The kernel algebra $\mathcal{B}_m(X,X)$ is particularly suited for applications in (generalized) coorbit theory: Usually, a host of technical conditions need to be verified to guarantee that coorbit space theory is applicable for a given continuous frame $Ψ$ and a Banach space $\mathbf{A}$. We show that it is enough to check that certain integral kernels associated to $Ψ$ belong to $\mathcal{B}_m(X,X)$; this ensures that the coorbit spaces $\operatorname{Co}_Ψ(L_κ^{p,q})$ are well-defined for all $p,q\in [1,\infty]$ and all weights $κ$ compatible with $m$. △ Less

Submitted 18 November, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Added appendix on sharpness for complex-valued integral kernels

MSC Class: 47G10; 47L80; 46E30; 47L10

arXiv:2001.09609 [pdf, ps, other]

On dual molecules and convolution-dominated operators

Authors: José Luis Romero, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, valid for groups with possibly non-polynomial growt… ▽ More We show that sampling or interpolation formulas in reproducing kernel Hilbert spaces can be obtained by reproducing kernels whose dual systems form molecules, ensuring that the size profile of a function is fully reflected by the size profile of its sampled values. The main tool is a local holomorphic calculus for convolution-dominated operators, valid for groups with possibly non-polynomial growth. Applied to the matrix coefficients of a group representation, our methods improve on classical results on atomic decompositions and bridge a gap between abstract and concrete methods. △ Less

Submitted 18 February, 2021; v1 submitted 27 January, 2020; originally announced January 2020.

MSC Class: 22A10; 42C15; 42C40; 43A15; 46E22

Journal ref: Journal of Functional Analysis, 280(10):108963, 56, 2021

arXiv:1905.04934 [pdf, ps, other]

Invertibility of frame operators on Besov-type decomposition spaces

Authors: José Luis Romero, Jordy Timo van Velthoven, Felix Voigtlaender

Abstract: We derive an extension of the Walnut-Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $L^2$ frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type spaces. This simplifies and improves recent results on… ▽ More We derive an extension of the Walnut-Daubechies criterion for the invertibility of frame operators. The criterion concerns general reproducing systems and Besov-type spaces. As an application, we conclude that $L^2$ frame expansions associated with smooth and fast-decaying reproducing systems on sufficiently fine lattices extend to Besov-type spaces. This simplifies and improves recent results on the existence of atomic decompositions, which only provide a particular dual reproducing system with suitable properties. In contrast, we conclude that the $L^2$ canonical frame expansions extend to many other function spaces, and, therefore, operations such as analyzing using the frame, thresholding the resulting coefficients, and then synthesizing using the canonical dual frame are bounded on these spaces. △ Less

Submitted 28 January, 2022; v1 submitted 13 May, 2019; originally announced May 2019.

MSC Class: 42B35; 42C15; 42C40

Journal ref: Journal of Geometric Analysis, 32(5):Paper No. 149, 2022

arXiv:1905.01208 [pdf, other]

Approximation spaces of deep neural networks

Authors: Rémi Gribonval, Gitta Kutyniok, Morten Nielsen, Felix Voigtlaender

Abstract: We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be… ▽ More We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep. △ Less

Submitted 17 July, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

arXiv:1904.12345 [pdf, ps, other]

Time-Frequency Shift Invariance of Gabor Spaces with an $S_0$-Generator

Authors: Andrei Caragea, Dae Gwan Lee, Friedrich Philipp, Felix Voigtlaender

Abstract: We consider Gabor Riesz sequences generated by a lattice $Λ\subset \mathbb{R}^2$ and a window function $g \in L^2(\mathbb{R})$ which is well localized in both time and frequency. When $g$ belongs to the Feichtinger algebra, we prove that only those time-frequency shifts with parameters from the lattice $Λ$ leave the corresponding Gabor space invariant. This improves on earlier results where only l… ▽ More We consider Gabor Riesz sequences generated by a lattice $Λ\subset \mathbb{R}^2$ and a window function $g \in L^2(\mathbb{R})$ which is well localized in both time and frequency. When $g$ belongs to the Feichtinger algebra, we prove that only those time-frequency shifts with parameters from the lattice $Λ$ leave the corresponding Gabor space invariant. This improves on earlier results where only lattices of rational density were considered. A slightly weaker result is proved - again for lattices of general density - under the regularity assumptions of the classical Balian-Low theorem, where both $g$ and its Fourier transform belong to the Sobolev space $H^1(\mathbb{R})$. The proof relies on a combination of methods from time-frequency analysis and the theory of $C^\ast$-algebras, specifically the so-called irrational rotation algebra. △ Less

Submitted 19 July, 2022; v1 submitted 28 April, 2019; originally announced April 2019.

MSC Class: 42C15; 42C30; 42C40

arXiv:1904.12250 [pdf, ps, other]

A quantitative subspace Balian-Low theorem

Authors: Andrei Caragea, Dae Gwan Lee, Friedrich Philipp, Felix Voigtlaender

Abstract: Let $\mathcal G\subset L^2(\mathbb R)$ be the subspace spanned by a Gabor Riesz sequence $(g,Λ)$ with $g\in L^2(\mathbb R)$ and a lattice $Λ\subset\mathbb R^2$ of rational density. It was shown recently that if $g$ is well-localized both in time and frequency, then $\mathcal G$ cannot contain any time-frequency shift $π(z) g$ of $g$ with $z\notinΛ$. In this paper, we improve the result to the quan… ▽ More Let $\mathcal G\subset L^2(\mathbb R)$ be the subspace spanned by a Gabor Riesz sequence $(g,Λ)$ with $g\in L^2(\mathbb R)$ and a lattice $Λ\subset\mathbb R^2$ of rational density. It was shown recently that if $g$ is well-localized both in time and frequency, then $\mathcal G$ cannot contain any time-frequency shift $π(z) g$ of $g$ with $z\notinΛ$. In this paper, we improve the result to the quantitative statement that the $L^2$-distance of $π(z)g$ to the space $\mathcal G$ is equivalent to the Euclidean distance of $z$ to the lattice $Λ$, in the sense that the ratio between those two distances is uniformly bounded above and below by positive constants. On the way, we prove several results of independent interest, one of them being closely related to the so-called weak Balian-Low theorem for subspaces. △ Less

Submitted 3 June, 2021; v1 submitted 27 April, 2019; originally announced April 2019.

Comments: 37 pages

MSC Class: Primary: 42C15. Secondary: 42C30; 42C40

arXiv:1904.10687 [pdf, other]

Design and properties of wave packet smoothness spaces

Authors: Dimitri Bytchenkoff, Felix Voigtlaender

Abstract: We introduce a family of quasi-Banach spaces - which we call wave packet smoothness spaces - that includes those function spaces which can be characterised by the sparsity of their expansions in Gabor frames, wave atoms, and many other frame constructions. We construct Banach frames for and atomic decompositions of the wave packet smoothness spaces and study their embeddings in each other and in a… ▽ More We introduce a family of quasi-Banach spaces - which we call wave packet smoothness spaces - that includes those function spaces which can be characterised by the sparsity of their expansions in Gabor frames, wave atoms, and many other frame constructions. We construct Banach frames for and atomic decompositions of the wave packet smoothness spaces and study their embeddings in each other and in a few more classical function spaces such as Besov and Sobolev spaces. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: accepted for publication in Journal de Mathématiques Pures et Appliquées

MSC Class: 42B35; 42C15; 46E15; 46E35; 42C40

arXiv:1904.04789 [pdf, ps, other]

Approximation in $L^p(μ)$ with deep ReLU neural networks

Authors: Felix Voigtlaender, Philipp Petersen

Abstract: We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases with the approximation accuracy. After reviewing… ▽ More We discuss the expressive power of neural networks which use the non-smooth ReLU activation function $\varrho(x) = \max\{0,x\}$ by analyzing the approximation theoretic properties of such networks. The existing results mainly fall into two categories: approximation using ReLU networks with a fixed depth, or using ReLU networks whose depth increases with the approximation accuracy. After reviewing these findings, we show that the results concerning networks with fixed depth--- which up to now only consider approximation in $L^p(λ)$ for the Lebesgue measure $λ$--- can be generalized to approximation in $L^p(μ)$, for any finite Borel measure $μ$. In particular, the generalized results apply in the usual setting of statistical learning theory, where one is interested in approximation in $L^2(\mathbb{P})$, with the probability measure $\mathbb{P}$ describing the distribution of the data. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: Accepted for presentation at SampTA 2019

MSC Class: 41A25; 82C32; 41A46

arXiv:1904.01476 [pdf, other]

5G for the Factory of the Future: Wireless Communication in an Industrial Environment

Authors: Florian Voigtländer, Ali Ramadan, Joseph Eichinger, Jürgen Grotepass, Karthikeyan Ganesan, Federico Diez Canseco, Dirk Pensky, Alois Knoll

Abstract: In this paper, the application of 5G communication technology in an industrial environment is discussed. It acts as an enabler for the separation of sensors/actors and resources, like memory and computational power. 5G offers characteristics essential for the proposed approach like robustness, ultra-low latency, high data rates and massive number of devices. A demonstrator of a production line w… ▽ More In this paper, the application of 5G communication technology in an industrial environment is discussed. It acts as an enabler for the separation of sensors/actors and resources, like memory and computational power. 5G offers characteristics essential for the proposed approach like robustness, ultra-low latency, high data rates and massive number of devices. A demonstrator of a production line was used as an test environment for 5G in a real-world industrial application. A wide variety of heterogeneous sensor systems is used by a mobile robot platform. The collected data is transmitted via a 5G network to various Cloud systems. The product is treated as a cyber-physical system with a RFID tag in conjunction with the product memory system. The dynamic production flow approach is discussed centered around the robot which is used for transportation and inspection of products. This inspection is performed during the transportation and influences the production flow directly. This is desirable in the scope of Industry 4.0 to have an efficient production down to batch size 1. △ Less

Submitted 14 March, 2019; originally announced April 2019.

Comments: "Presented at ICCR2018: International Conference on Cloud and Robotics 2018 (arXiv:1903.04824v1)"

Report number: ICCR/2018/01

arXiv:1810.10032 [pdf, ps, other]

Negative results for approximation using single layer and multilayer feedforward neural networks

Authors: J. M. Almira, P. E. Lopez-de-Teruel, D. J. Romero-Lopez, F. Voigtlaender

Abstract: We prove a negative result for the approximation of functions defined on compact subsets of $\mathbb{R}^d$ (where $d \geq 2$) using feedforward neural networks with one hidden layer and arbitrary continuous activation function. In a nutshell, this result claims the existence of target functions that are as difficult to approximate using these neural networks as one may want. We also demonstrate an… ▽ More We prove a negative result for the approximation of functions defined on compact subsets of $\mathbb{R}^d$ (where $d \geq 2$) using feedforward neural networks with one hidden layer and arbitrary continuous activation function. In a nutshell, this result claims the existence of target functions that are as difficult to approximate using these neural networks as one may want. We also demonstrate an analogous result (for general $d \in \mathbb{N}$) for neural networks with an \emph{arbitrary} number of hidden layers, for activation functions that are either rational functions or continuous splines with finitely many pieces. △ Less

Submitted 25 August, 2020; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: 12 pages, submitted to a Journal

arXiv:1809.10437 [pdf, other]

doi 10.1051/0004-6361/201834041

Learning sparse representations on the sphere

Authors: Florent Sureau, Felix Voigtlaender, Malte Wust, Jean-Luc Starck, Gitta Kutyniok

Abstract: Many representation systems on the sphere have been proposed in the past, such as spherical harmonics, wavelets, or curvelets. Each of these data representations is designed to extract a specific set of features, and choosing the best fixed representation system for a given scientific application is challenging. In this paper, we show that we can learn directly a representation system from given d… ▽ More Many representation systems on the sphere have been proposed in the past, such as spherical harmonics, wavelets, or curvelets. Each of these data representations is designed to extract a specific set of features, and choosing the best fixed representation system for a given scientific application is challenging. In this paper, we show that we can learn directly a representation system from given data on the sphere. We propose two new adaptive approaches: the first is a (potentially multi-scale) patch-based dictionary learning approach, and the second consists in selecting a representation among a parametrized family of representations, the α-shearlets. We investigate their relative performance to represent and denoise complex structures on different astrophysical data sets on the sphere. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1809.00973 [pdf, other]

Equivalence of approximation by convolutional neural networks and fully-connected networks

Authors: Philipp Petersen, Felix Voigtlaender

Abstract: Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions… ▽ More Convolutional neural networks are the most widely used type of neural networks in applications. In mathematical analysis, however, mostly fully-connected networks are studied. In this paper, we establish a connection between both network architectures. Using this connection, we show that all upper and lower bounds concerning approximation rates of {fully-connected} neural networks for functions $f \in \mathcal{C}$ -- for an arbitrary function class $\mathcal{C}$ -- translate to essentially the same bounds concerning approximation rates of convolutional neural networks for functions $f \in {\mathcal{C}^{equi}}$, with the class ${\mathcal{C}^{equi}}$ consisting of all translation equivariant functions whose first coordinate belongs to $\mathcal{C}$. All presented results consider exclusively the case of convolutional neural networks without any pooling operation and with circular convolutions, i.e., not based on zero-padding. △ Less

Submitted 28 January, 2021; v1 submitted 4 September, 2018; originally announced September 2018.

MSC Class: 41A25; 44A35; 41A46

Journal ref: Proc. Amer. Math. Soc. 148 (2020), 1567-1581

arXiv:1807.06380 [pdf, ps, other]

doi 10.1007/978-3-030-05210-2

On the Atomic Decomposition of Coorbit Spaces with Non-Integrable Kernel

Authors: Stephan Dahlke, Filippo De Mari, Ernesto De Vito, Lukas Sawatzki, Gabriele Steidl, Gerd Teschke, Felix Voigtlaender

Abstract: This paper ist concerned with recent progress in the context of coorbit space theory. Based on a square integrable group representation, the coorbit theory provides new families of associated smoothness spaces, where the smoothness of a function is measured by the decay of the associated voice transform. Moreover, by discretizing the representation, atomic decomposi- tions and Banach frames can be… ▽ More This paper ist concerned with recent progress in the context of coorbit space theory. Based on a square integrable group representation, the coorbit theory provides new families of associated smoothness spaces, where the smoothness of a function is measured by the decay of the associated voice transform. Moreover, by discretizing the representation, atomic decomposi- tions and Banach frames can be constructed. Usually, the whole machinery works well if the associated reproducing kernel is integrable with respect to a weighted Haar measure on the group. In recent studies, it has turned out that to some extent coorbit spaces can still be established if this condition is violated. In this paper, we clarify in which sense atomic decompositions and Banach frames for these generalized coorbit spaces can be obtained. △ Less

Submitted 17 July, 2018; originally announced July 2018.

MSC Class: 46E35; 43A15; 42B35; 22D10; 42B15

arXiv:1806.08459 [pdf, ps, other]

Topological properties of the set of functions generated by neural networks of fixed size

Authors: Philipp Petersen, Mones Raslan, Felix Voigtlaender

Abstract: We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0 < p < \infty$, for all practically-used activation functions, and also not clo… ▽ More We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0 < p < \infty$, for all practically-used activation functions, and also not closed with respect to the $L^\infty$-norm for all practically-used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $f_1, f_2$ are two functions realized by neural networks and if $f_1, f_2$ are close in the sense that $\|f_1 - f_2\|_{L^\infty} \leq \varepsilon$ for $\varepsilon > 0$, it is, regardless of the size of $\varepsilon$, usually not possible to find weights $w_1, w_2$ close together such that each $f_i$ is realized by a neural network with weights $w_i$. Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence. △ Less

Submitted 23 January, 2020; v1 submitted 21 June, 2018; originally announced June 2018.

MSC Class: 54H99; 68T05; 52A30

arXiv:1710.03576 [pdf, ps, other]

A general version of Price's theorem

Authors: Felix Voigtlaender

Abstract: Assume that $X_Σ \in \mathbb{R}^{n}$ is a centered random vector following a multivariate normal distribution with positive definite covariance matrix $Σ$. Let $g : \mathbb{R}^{n} \to \mathbb{C}$ be measurable and of moderate growth, say $|g(x)| \lesssim (1 + |x|)^{N}$. We show that the map $Σ\mapsto \mathbb{E}[g(X_Σ)]$ is smooth, and we derive convenient expressions for its partial derivatives, i… ▽ More Assume that $X_Σ \in \mathbb{R}^{n}$ is a centered random vector following a multivariate normal distribution with positive definite covariance matrix $Σ$. Let $g : \mathbb{R}^{n} \to \mathbb{C}$ be measurable and of moderate growth, say $|g(x)| \lesssim (1 + |x|)^{N}$. We show that the map $Σ\mapsto \mathbb{E}[g(X_Σ)]$ is smooth, and we derive convenient expressions for its partial derivatives, in terms of certain expectations $\mathbb{E}[(\partial^αg)(X_Σ)]$ of partial (distributional) derivatives of $g$. As we discuss, this result can be used to derive bounds for the expectation $\mathbb{E}[g(X_Σ)]$ of a nonlinear function $g(X_Σ)$ of a Gaussian random vector $X_Σ$ with possibly correlated entries. For the case when $g\left(x\right) = g_{1}(x_{1}) \cdots g_{n}(x_{n})$ has tensor-product structure, the above result is known in the engineering literature as Price's theorem, originally published in 1958. For dimension $n = 2$, it was generalized in 1964 by McMahon to the general case $g : \mathbb{R}^{2} \to \mathbb{C}$. Our contribution is to unify these results, and to give a mathematically fully rigorous proof. Precisely, we consider a normally distributed random vector $X_Σ \in \mathbb{R}^{n}$ of arbitrary dimension $n \in \mathbb{N}$, and we allow the nonlinearity $g$ to be a general tempered distribution. To this end, we replace the expectation $\mathbb{E}\left[g(X_Σ)\right]$ by the dual pairing $\left\langle g,\,φ_Σ\right\rangle_{\mathcal{S}',\mathcal{S}}$, where $φ_Σ$ denotes the probability density function of $X_Σ$. △ Less

Submitted 1 June, 2020; v1 submitted 10 October, 2017; originally announced October 2017.

Comments: Accepted for publication in "Journal of Theoretical Probability"

MSC Class: 60G15; 62H20

arXiv:1709.05289 [pdf, ps, other]

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

Authors: Philipp Petersen, Felix Voigtlaender

Abstract: We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^β(\mathbb R^d)$ of possibly discontinuous piecewise $C^β$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by… ▽ More We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^β(\mathbb R^d)$ of possibly discontinuous piecewise $C^β$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by $C^β$ hypersurfaces. For dimension $d \geq 2$, regularity $β> 0$, and accuracy $\varepsilon > 0$, we construct artificial neural networks with ReLU activation function that approximate functions from $\mathcal{E}^β(\mathbb R^d)$ up to $L^2$ error of $\varepsilon$. The constructed networks have a fixed number of layers, depending only on $d$ and $β$, and they have $O(\varepsilon^{-2(d-1)/β})$ many nonzero weights, which we prove to be optimal. In addition to the optimality in terms of the number of weights, we show that in order to achieve the optimal approximation rate, one needs ReLU networks of a certain depth. Precisely, for piecewise $C^β(\mathbb R^d)$ functions, this minimal depth is given---up to a multiplicative constant---by $β/d$. Up to a log factor, our constructed networks match this bound. This partly explains the benefits of depth for ReLU networks by showing that deep networks are necessary to achieve efficient approximation of (piecewise) smooth functions. Finally, we analyze approximation in high-dimensional spaces where the function $f$ to be approximated can be factorized into a smooth dimension reducing feature map $τ$ and classifier function $g$---defined on a low-dimensional feature space---as $f = g \circ τ$. We show that in this case the approximation rate depends only on the dimension of the feature space and not the input dimension. △ Less

Submitted 22 May, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

Comments: Generalized some estimates to $L^p$ norms for $0<p<\infty$

MSC Class: 41A25; 41A10; 82C32; 41A46; 68T05

arXiv:1702.03559 [pdf, ps, other]

Analysis vs. synthesis sparsity for $α$-shearlets

Authors: Felix Voigtlaender, Anne Pein

Abstract: There are two notions of sparsity associated to a frame $Ψ=(ψ_i)_{i\in I}$: Analysis sparsity of $f$ means that the analysis coefficients $(\langle f,ψ_i\rangle)_i$ are sparse, while synthesis sparsity means that $f=\sum_i c_iψ_i$ with sparse coefficients $(c_i)_i$. Here, sparsity of $c=(c_i)_i$ means $c\in\ell^p(I)$ for a given $p<2$. We show that both notions of sparsity coincide if… ▽ More There are two notions of sparsity associated to a frame $Ψ=(ψ_i)_{i\in I}$: Analysis sparsity of $f$ means that the analysis coefficients $(\langle f,ψ_i\rangle)_i$ are sparse, while synthesis sparsity means that $f=\sum_i c_iψ_i$ with sparse coefficients $(c_i)_i$. Here, sparsity of $c=(c_i)_i$ means $c\in\ell^p(I)$ for a given $p<2$. We show that both notions of sparsity coincide if $Ψ={\rm SH}(\varphi,ψ;δ)$ is a discrete (cone-adapted) shearlet frame with 'nice' generators $\varphi,ψ$ and fine enough sampling density $δ>0$. The required 'niceness' is explicitly quantified in terms of Fourier-decay and vanishing moment conditions. Precisely, we show that suitable shearlet systems simultaneously provide Banach frames and atomic decompositions for the shearlet smoothness spaces $\mathscr{S}_s^{p,q}$ introduced by Labate et al. Hence, membership in $\mathscr{S}_s^{p,q}$ is simultaneously equivalent to analysis sparsity and to synthesis sparsity w.r.t. the shearlet frame. As an application, we prove that shearlets yield (almost) optimal approximation rates for cartoon-like functions $f$: If $ε>0$, then $\Vert f-f_N\Vert_{L^2}\lesssim N^{-(1-ε)}$, where $f_N$ is a linear combination of N shearlets. This might appear to be well-known, but the existing proofs only establish this approximation rate w.r.t. the dual $\tildeΨ$ of $Ψ$, not w.r.t. $Ψ$ itself. This is not completely satisfying, since the properties of $\tildeΨ$ (decay, smoothness, etc.) are largely unknown. We also consider $α$-shearlet systems. For these, the shearlet smoothness spaces have to be replaced by $α$-shearlet smoothness spaces. We completely characterize the embeddings between these spaces, allowing us to decide whether sparsity w.r.t. $α_1$-shearlets implies sparsity w.r.t. $α_2$-shearlets. △ Less

Submitted 12 February, 2017; originally announced February 2017.

MSC Class: 41A25; 41A30; 42C40; 42C15; 42B35

arXiv:1612.08772 [pdf, ps, other]

Structured, compactly supported Banach frame decompositions of decomposition spaces

Authors: Felix Voigtlaender

Abstract: $\newcommand{mc}[1]{\mathcal{#1}}$ $\newcommand{D}{\mc{D}(\mc{Q},L^p,\ell_w^q)}$ We present a framework for the construction of structured, possibly compactly supported Banach frames and atomic decompositions for decomposition spaces. Such a space $\D$ is defined using a frequency covering $\mc{Q}=(Q_i)_{i\in I}$: If $(\varphi_i)_{i}$ is a suitable partition of unity subordinate to $\mc{Q}… ▽ More $\newcommand{mc}[1]{\mathcal{#1}}$ $\newcommand{D}{\mc{D}(\mc{Q},L^p,\ell_w^q)}$ We present a framework for the construction of structured, possibly compactly supported Banach frames and atomic decompositions for decomposition spaces. Such a space $\D$ is defined using a frequency covering $\mc{Q}=(Q_i)_{i\in I}$: If $(\varphi_i)_{i}$ is a suitable partition of unity subordinate to $\mc{Q}$, then $\Vert g\Vert_{\D}:=\left\Vert\left(\Vert\mc{F}^{-1}(\varphi_i\hat{g})\Vert_{L^p}\right)_{i}\right\Vert_{\ell_w^q}$. We assume $\mc{Q}=(T_iQ+b_i)_{i}$, with $T_i\in{\rm GL}(\Bbb{R}^d),b_i\in\Bbb{R}^d$. Given a prototype $γ$, we consider the system \[Ψ_{c}=(L_{c\cdot T_i^{-T}k}γ^{[i]})_{i\in I,k\in\Bbb{Z}^d}\text{ with }γ^{[i]}=|\det T_i|^{1/2}\, M_{b_i}(γ\circ T_i^T),\] with translation $L_x$ and modulation $M_ξ$. We provide verifiable conditions on $γ$ under which $Ψ_c$ forms a Banach frame or an atomic decomposition for $\D$, for small enough sampling density $c>0$. Our theory allows compactly supported prototypes and applies for arbitrary $p,q\in(0,\infty]$. Often, $Ψ_c$ is both a Banach frame and an atomic decomposition, so that analysis sparsity is equivalent to synthesis sparsity, i.e. the analysis coefficients $(\langle f,L_{c\cdot T_i^{-T}k}γ^{[i]}\rangle)_{i,k}$ lie in $\ell^p$ iff $f$ belongs to a certain decomposition space, iff $f=\sum_{i,k}c_k^{(i)}\cdot L_{c\cdot T_i^{-T}k}γ^{[i]}$ with $(c_k^{(i)})_{i,k}\in\ell^p$. This is convenient if only analysis sparsity is known to hold: Generally, this only yields synthesis sparsity w.r.t. the dual frame, about which often only little is known. But our theory yields synthesis sparsity w.r.t. the well-understood primal frame. In particular, our theory applies to $α$-modulation spaces and inhom. Besov spaces. It also applies to shearlet frames, as we show in a companion paper. △ Less

Submitted 27 December, 2016; originally announced December 2016.

MSC Class: 42B35; 42C15; 42C40; 46E15; 46E35

arXiv:1606.04924 [pdf, other]

From Frazier-Jawerth characterizations of Besov spaces to Wavelets and Decomposition spaces

Authors: Hans Georg Feichtinger, Felix Voigtlaender

Abstract: This article describes how the ideas promoted by the fundamental papers published by M. Frazier and B. Jawerth in the eighties have influenced subsequent developments related to the theory of atomic decompositions and Banach frames for function spaces such as the modulation spaces and Besov-Triebel-Lizorkin spaces. Both of these classes of spaces arise as special cases of two different, general… ▽ More This article describes how the ideas promoted by the fundamental papers published by M. Frazier and B. Jawerth in the eighties have influenced subsequent developments related to the theory of atomic decompositions and Banach frames for function spaces such as the modulation spaces and Besov-Triebel-Lizorkin spaces. Both of these classes of spaces arise as special cases of two different, general constructions of function spaces: coorbit spaces and decomposition spaces. Coorbit spaces are defined by imposing certain decay conditions on the so-called voice transform of the function/distribution under consideration. As a concrete example, one might think of the wavelet transform, leading to the theory of Besov-Triebel-Lizorkin spaces. Decomposition spaces, on the other hand, are defined using certain decompositions in the Fourier domain. For Besov-Triebel-Lizorkin spaces, one uses a dyadic decomposition, while a uniform decomposition yields modulation spaces. Only recently, the second author has established a fruitful connection between modern variants of wavelet theory with respect to general dilation groups (which can be treated in the context of coorbit theory) and a particular family of decomposition spaces. In this way, optimal inclusion results and invariance properties for a variety of smoothness spaces can be established. We will present an outline of these connections and comment on the basic results arising in this context. △ Less

Submitted 15 June, 2016; originally announced June 2016.

MSC Class: 42C15; 46E35; 42C40

arXiv:1605.09705 [pdf, ps, other]

Embeddings of decomposition spaces

Authors: Felix Voigtlaender

Abstract: Many smoothness spaces in harmonic analysis are decomposition spaces. In this paper we ask: Given two decomposition spaces, is there an embedding between the two? A decomposition space $\mathcal{D}(\mathcal{Q}, L^p, Y)$ can be described using : a covering $\mathcal{Q}=(Q_{i})_{i\in I}$ of the frequency domain, an exponent $p$ and a sequence space $Y\subset\mathbb{C}^{I}$. Given these, the decomp… ▽ More Many smoothness spaces in harmonic analysis are decomposition spaces. In this paper we ask: Given two decomposition spaces, is there an embedding between the two? A decomposition space $\mathcal{D}(\mathcal{Q}, L^p, Y)$ can be described using : a covering $\mathcal{Q}=(Q_{i})_{i\in I}$ of the frequency domain, an exponent $p$ and a sequence space $Y\subset\mathbb{C}^{I}$. Given these, the decomp. space norm of a distribution $g$ is $\| g\| _{\mathcal{D}(\mathcal{Q}, L^p, Y)}=\left\| \left(\left\| \mathcal{F}^{-1}\left(\varphi_{i}\widehat{g}\right)\right\| _{L^{p}}\right)_{i\in I}\right\| _{Y}$, where $(\varphi_{i})_{i\in I}$ is a suitable partition of unity for $\mathcal{Q}$. We establish readily verifiable criteria which ensure an embedding $\mathcal{D}(\mathcal{Q}, L^{p_1}, Y)\hookrightarrow\mathcal{D}(\mathcal{P}, L^{p_2}, Z)$, mostly concentrating on the case, $Y=\ell_{w}^{q_{1}}(I)$ and $Z=\ell_{v}^{q_{2}}(J)$. The relevant sufficient conditions are $p_{1}\leq p_{2}$, and finiteness of a norm of the form \[ \left\| \left(\left\| (α_{i}\,β_j \cdot v_{j}/w_{i})_{i\in I_{j}}\right\| _{\ell^{t}}\right)_{j\in J}\right\| _{\ell^{s}}<\infty, \] where the \[ I_{j}=\{ i\in I : Q_{i}\cap P_{j}\neq\emptyset\} \qquad\text{ for }j\in J \] are defined in terms of the two coverings $\mathcal{Q}=(Q_{i})_{i\in I}$ and $\mathcal{P}=(P_{j})_{j\in J}$. We also show that these criteria are sharp: For almost arbitrary coverings and certain ranges of $p_{1},p_{2}$, our criteria yield a complete characterization. The same holds for arbitrary values of $p_{1},p_{2}$ under more strict assumptions on the coverings. We illustrate the resulting theory by applications to $α$-modulation and Besov spaces. All known embedding results for these spaces are special cases of our approach; often, we improve considerably upon the state of the art. △ Less

Submitted 5 October, 2019; v1 submitted 31 May, 2016; originally announced May 2016.

MSC Class: 42B35; 46E15; 46E35

arXiv:1601.02201 [pdf, ps, other]

Embeddings of Decomposition Spaces into Sobolev and BV Spaces

Authors: Felix Voigtlaender

Abstract: In the present paper, we investigate whether an embedding of a decomposition space $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)$ into a given Sobolev space $W^{k,q}(\mathbb{R}^{d})$ exists. As special cases, this includes embeddings into Sobolev spaces of (homogeneous and inhomogeneous) Besov spaces, ($α$)-modulation spaces, shearlet smoothness spaces and also of a large class of wavelet coorbit s… ▽ More In the present paper, we investigate whether an embedding of a decomposition space $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)$ into a given Sobolev space $W^{k,q}(\mathbb{R}^{d})$ exists. As special cases, this includes embeddings into Sobolev spaces of (homogeneous and inhomogeneous) Besov spaces, ($α$)-modulation spaces, shearlet smoothness spaces and also of a large class of wavelet coorbit spaces, in particular of shearlet-type coorbit spaces. Precisely, we will show that under extremely mild assumptions on the covering $\mathcal{Q}=\left(Q_{i}\right)_{i\in I}$, we have $\mathcal{D}\left(\mathcal{Q},L^{p},Y\right)\hookrightarrow W^{k,q}(\mathbb{R}^{d})$ as soon as $p\leq q$ and $Y\hookrightarrow\ell_{u^{\left(k,p,q\right)}}^{q^{\triangledown}}\left(I\right)$ hold. Here, $q^{\triangledown}=\min\left\{ q,q'\right\} $ and the weight $u^{\left(k,p,q\right)}$ can be easily computed, only based on the covering $\mathcal{Q}$ and on the parameters $k,p,q$. Conversely, a necessary condition for existence of the embedding is that $p\leq q$ and $Y\cap\ell_{0}\left(I\right)\hookrightarrow\ell_{u^{\left(k,p,q\right)}}^{q}\left(I\right)$ hold, where $\ell_{0}\left(I\right)$ denotes the space of finitely supported sequences on $I$. All in all, for the range $q \in (0,2]\cup\{\infty\}$, we obtain a complete characterization of existence of the embedding in terms of readily verifiable criteria. We can also completely characterize existence of an embedding of a decomposition space into a BV space. △ Less

Submitted 10 January, 2016; originally announced January 2016.

MSC Class: 42B35; 46E15; 46E30; 46E35

arXiv:1412.7158 [pdf, ps, other]

Resolution of the wavefront set using general continuous wavelet transforms

Authors: Jonathan Fell, Hartmut Führ, Felix Voigtlaender

Abstract: We consider the problem of characterizing the wavefront set of a tempered distribution $u\in\mathcal{S}'(\mathbb{R}^{d})$ in terms of its continuous wavelet transform, where the latter is defined with respect to a suitably chosen dilation group $H\subset{\rm GL}(\mathbb{R}^{d})$. In this paper we develop a comprehensive and unified approach that allows to establish characterizations of the wavefro… ▽ More We consider the problem of characterizing the wavefront set of a tempered distribution $u\in\mathcal{S}'(\mathbb{R}^{d})$ in terms of its continuous wavelet transform, where the latter is defined with respect to a suitably chosen dilation group $H\subset{\rm GL}(\mathbb{R}^{d})$. In this paper we develop a comprehensive and unified approach that allows to establish characterizations of the wavefront set in terms of rapid coefficient decay, for a large variety of dilation groups. For this purpose, we introduce two technical conditions on the dual action of the group $H$, called microlocal admissibilty and (weak) cone approximation property. Essentially, microlocal admissibilty sets up a systematical relationship between the scales in a wavelet dilated by $h\in H$ on one side, and the matrix norm of $h$ on the other side. The (weak) cone approximation property describes the ability of the wavelet system to adapt its frequency-side localization to arbitrary frequency cones. Together, microlocal admissibility and the weak cone approximation property allow the characterization of points in the wavefront set using multiple wavelets. Replacing the weak cone approximation by its stronger counterpart gives access to single wavelet characterizations. We illustrate the scope of our results by discussing -- in any dimension $d\ge2$ -- the similitude, diagonal and shearlet dilation groups, for which we verify the pertinent conditions. As a result, similitude and diagonal groups can be employed for multiple wavelet characterizations, whereas for the shearlet groups a single wavelet suffices. In particular, the shearlet characterization (previously only established for $d=2$) holds in arbitrary dimensions. △ Less

Submitted 22 December, 2014; originally announced December 2014.

MSC Class: 42C15; 42C40; 46F12

arXiv:1404.4298 [pdf, ps, other]

Wavelet Coorbit Spaces viewed as Decomposition Spaces

Authors: Hartmut Führ, Felix Voigtlaender

Abstract: In this paper we show that the Fourier transform induces an isomorphism between the coorbit spaces defined by Feichtinger and Gröchenig of the mixed, weighted Lebesgue spaces $L_{v}^{p,q}$ with respect to the quasi-regular representation of a semi-direct product $\mathbb{R}^{d}\rtimes H$ with suitably chosen dilation group $H$, and certain decomposition spaces… ▽ More In this paper we show that the Fourier transform induces an isomorphism between the coorbit spaces defined by Feichtinger and Gröchenig of the mixed, weighted Lebesgue spaces $L_{v}^{p,q}$ with respect to the quasi-regular representation of a semi-direct product $\mathbb{R}^{d}\rtimes H$ with suitably chosen dilation group $H$, and certain decomposition spaces $\mathcal{D}\left(\mathcal{Q},L^{p},\ell_{u}^{q}\right)$ (essentially as introduced by Feichtinger and Gröbner), where the localized ,,parts`` of a function are measured in the $\mathcal{F}L^{p}$-norm. This equivalence is useful in several ways: It provides access to a Fourier-analytic understanding of wavelet coorbit spaces, and it allows to discuss coorbit spaces associated to different dilation groups in a common framework. As an illustration of these points, we include a short discussion of dilation invariance properties of coorbit spaces associated to different types of dilation groups. △ Less

Submitted 16 April, 2014; originally announced April 2014.

MSC Class: 42B35; 42C40; 46F05

Showing 1–48 of 48 results for author: Voigtlaender, F