Search | arXiv e-print repository

On the expressivity of deep Heaviside networks

Authors: Insung Kong, Juntong Chen, Sophie Langer, Johannes Schmidt-Hieber

Abstract: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regre… ▽ More We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regression model. △ Less

Submitted 30 April, 2025; originally announced May 2025.

Comments: 61 pages, 16 figures

arXiv:2503.11891 [pdf, other]

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

Authors: Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

Abstract: We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharp… ▽ More We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharpness of the loss. In particular, the noise changes the expected gradient to force balancing of the weight matrices at a fast rate along the descent trajectory. In the diagonal linear model, we show that this equates to minimizing the average sharpness, as well as the trace of the Hessian matrix, among all possible factorizations of the same matrix. Further, the noise forces the gradient descent iterates towards a shrinkage-thresholding of the underlying true parameter, with the noise level explicitly regulating both the shrinkage factor and the threshold. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 54 pages, 3 figures

arXiv:2410.20068 [pdf, other]

Understanding the Effect of GCN Convolutions in Regression Tasks

Authors: Juntong Chen, Johannes Schmidt-Hieber, Claire Donnat, Olga Klopp

Abstract: Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes e… ▽ More Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes exhibit similar signals and provide statistical theory for the impact of convolution operators. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSAGE convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance type trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners. △ Less

Submitted 16 April, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

Comments: 25 pages

MSC Class: 62G08; 68R10

arXiv:2410.15800 [pdf, ps, other]

On the VC dimension of deep group convolutional neural networks

Authors: Anna Sepliarskaia, Sophie Langer, Johannes Schmidt-Hieber

Abstract: We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural n… ▽ More We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural networks. Our findings extend previous results on the VC dimension of continuous GCNNs with two layers, thereby providing new insights into the generalization properties of GCNNs, particularly regarding the dependence on the input resolution of the data. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2409.07434 [pdf, other]

Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models

Authors: Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu

Abstract: This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide que… ▽ More This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide quenched central limit theorems (CLT) for the difference between dropout and $\ell^2$-regularized iterates, regardless of initialization. The CLT for the difference between the Ruppert-Polyak averaged SGD (ASGD) with dropout and $\ell^2$-regularized iterates is also presented. Based on these asymptotic normality results, we further introduce an online estimator for the long-run covariance matrix of ASGD dropout to facilitate inference in a recursive manner with efficiency in computational time and memory. The numerical experiments demonstrate that for sufficiently large samples, the proposed confidence intervals for ASGD with dropout nearly achieve the nominal coverage probability. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 77 pages, 5 figures, 4 tables

MSC Class: 62E20; 62F12; 68W27

arXiv:2306.10529 [pdf, other]

Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model

Authors: Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

Abstract: We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, ow… ▽ More We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout. Further, we study a simplified variant of dropout which does not have a regularizing effect and converges to the least squares estimator △ Less

Submitted 25 April, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 52 pages, 2 figures

Journal ref: Journal of Machine Learning Research, 25(204), 2024

arXiv:2205.07764 [pdf, ps, other]

On the inability of Gaussian process regression to optimally learn compositional functions

Authors: Matteo Giordano, Kolyan Ray, Johannes Schmidt-Hieber

Abstract: We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on… ▽ More We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$. △ Less

Submitted 27 September, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: 20 pages, to appear in Advances in Neural Information Processing Systems 36 (NeurIPS 2022)

arXiv:2201.04545 [pdf, other]

doi 10.1109/TIT.2022.3215088

On generalization bounds for deep networks based on loss surface implicit regularization

Authors: Masaaki Imaizumi, Johannes Schmidt-Hieber

Abstract: The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by sto… ▽ More The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima. △ Less

Submitted 16 October, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: To appear in IEEE Transaction on Information Theory

arXiv:2007.15884 [pdf, other]

The Kolmogorov-Arnold representation theorem revisited

Authors: Johannes Schmidt-Hieber

Abstract: There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of… ▽ More There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of the main obstacles is that the outer function depends on the represented function and can be wildly varying even if the represented function is smooth. We derive modifications of the Kolmogorov-Arnold representation that transfer smoothness properties of the represented function to the outer function and can be well approximated by ReLU networks. It appears that instead of two hidden layers, a more natural interpretation of the Kolmogorov-Arnold representation is that of a deep neural network where most of the layers are required to approximate the interior function. △ Less

Submitted 2 January, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

Comments: 21 pages

MSC Class: 41A30

arXiv:2006.00278 [pdf, ps, other]

On lower bounds for the bias-variance trade-off

Authors: Alexis Derumigny, Johannes Schmidt-Hieber

Abstract: It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a… ▽ More It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or $χ^2$-divergence. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. In the Gaussian sequence model, different phase transitions of the bias-variance trade-off occur. Although there is a non-trivial interplay between bias and variance, the rate of the squared bias and the variance do not have to be balanced in order to achieve the minimax estimation rate. △ Less

Submitted 20 March, 2023; v1 submitted 30 May, 2020; originally announced June 2020.

Comments: 52 pages, 2 figures, 1 table

MSC Class: 62G05; 62C05; 62C20

arXiv:1908.00695 [pdf, other]

Deep ReLU network approximation of functions on a manifold

Authors: Johannes Schmidt-Hieber

Abstract: Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a Hölder function with s… ▽ More Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a Hölder function with smoothness index $β$ up to error $ε$ using of the order of $ε^{-d^*/β}\log(1/ε)$ many non-zero network parameters. As an application, we derive statistical convergence rates for the estimator minimizing the empirical risk over all possible choices of bounded network parameters. △ Less

Submitted 2 August, 2019; originally announced August 2019.

arXiv:1804.02253 [pdf, other]

A comparison of deep networks with ReLU activation function and linear spline-type methods

Authors: Konstantin Eckle, Johannes Schmidt-Hieber

Abstract: Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline met… ▽ More Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with $M$ parameters there exists a multilayer neural network with $O(M \log (M/\varepsilon))$ parameters that approximates this function up to sup-norm error $\varepsilon.$ We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations. △ Less

Submitted 24 September, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

MSC Class: 62G08 (Primary); 62G20 (Secondary)

arXiv:1708.06633 [pdf, other]

doi 10.1214/19-AOS1875

Nonparametric regression using deep neural networks with ReLU activation function

Authors: Johannes Schmidt-Hieber

Abstract: Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constrain… ▽ More Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential network parameters exceeding the sample size. The analysis gives some insights into why multilayer feedforward neural networks perform well in practice. Interestingly, for ReLU activation function the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural. It is also shown that under the composition assumption wavelet estimators can only achieve suboptimal rates. △ Less

Submitted 13 September, 2020; v1 submitted 22 August, 2017; originally announced August 2017.

Comments: article, rejoinder and supplementary material

MSC Class: 62G08

Journal ref: Article: Annals of Statistics, Volume 48, Number 4, 1875-1897, 2020, Rejoinder: Annals of Statistics, Volume 48, Number 4, 1916-1921, 2020

arXiv:1704.01066 [pdf, other]

Tests for qualitative features in the random coefficients model

Authors: Fabian Dunker, Konstantin Eckle, Katharina Proksch, Johannes Schmidt-Hieber

Abstract: The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of d… ▽ More The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of dimensionality and the ill-posedness, pointwise nonparametric estimation of the joint density is difficult and suffers from slow convergence rates. Larger features, such as an increase of the density along some direction or a well-accentuated mode can, however, be much easier detected from data by means of statistical tests. In this article, we follow this strategy and construct tests and confidence statements for qualitative features of the joint density, such as increases, decreases and modes. We propose a multiple testing approach based on aggregating single tests which are designed to extract shape information on fixed scales and directions. Using recent tools for Gaussian approximations of multivariate empirical processes, we derive expressions for the critical value. We apply our method to simulated and real data. △ Less

Submitted 13 March, 2018; v1 submitted 4 April, 2017; originally announced April 2017.

MSC Class: 62G10; 62G15; 62G20

arXiv:1512.00218 [pdf, ps, other]

doi 10.1088/0266-5611/32/6/065003

Minimax theory for a class of non-linear statistical inverse problems

Authors: Kolyan Ray, Johannes Schmidt-Hieber

Abstract: We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step i… ▽ More We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step is based on wavelet thresholding and is shown to be minimax optimal (up to logarithmic factors) in a pointwise function-dependent sense. Our analysis is based on a modified notion of Hölder smoothness scales that are natural in this setting. △ Less

Submitted 11 May, 2016; v1 submitted 1 December, 2015; originally announced December 2015.

Comments: 37 pages

MSC Class: 62G05 (Primary); 62G08; 62G20 (Secondary)

Journal ref: Inverse Problems 32 (2016) 065003

arXiv:1403.0735 [pdf, ps, other]

doi 10.1214/15-AOS1334

Bayesian linear regression with sparse priors

Authors: Ismaël Castillo, Johannes Schmidt-Hieber, Aad van der Vaart

Abstract: We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It… ▽ More We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification. △ Less

Submitted 14 October, 2015; v1 submitted 4 March, 2014; originally announced March 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1334 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1334

Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 1986-2018

arXiv:1309.6178 [pdf, ps, other]

Spot volatility estimation for high-frequency data: adaptive estimation in practice

Authors: Till Sabel, Johannes Schmidt-Hieber, Axel Munk

Abstract: We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise… ▽ More We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise model (e.g. jumps, rounding errors). These modifications are justified by simulations. The second part is devoted to investigate the behavior of volatility in response to macroeconomic events. We give evidence that the spot volatility of Euro-BUND futures is considerably higher during press conferences of the European Central Bank. As an outlook, we present an estimator for the spot covolatility of two different prices. △ Less

Submitted 24 September, 2013; originally announced September 2013.

MSC Class: 91B84; 62G08; 65T60; 62M99

arXiv:1303.3118 [pdf, ps, other]

On an estimator achieving the adaptive rate in nonparametric regression under $L^p$-loss for all $1\leq p \leq \infty$

Authors: Johannes Schmidt-Hieber

Abstract: Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a Hölder ball with smoothness index $β$ is $n^{-β/(2β+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-β/(2β+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other… ▽ More Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a Hölder ball with smoothness index $β$ is $n^{-β/(2β+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-β/(2β+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other way around. In this article, we construct an estimator that simultaneously achieves the optimal rates under $L^p$-risk for all $1\leq p\leq \infty$ without prior knowledge of $β.$ In contrast to classical wavelet thresholding methods that kill small empirical wavelet coefficients and keep large ones, it is essential for simultaneous adaptation that on each resolution level, the largest empirical wavelet coefficients are truncated. This leads to a completely different point of view on wavelet thresholding. The crucial part in the construction of the estimator is the size of the truncation level which is linked to the unknown smoothness index. Although estimation of the smoothness index is known to be a difficult task, there is a data-driven choice of the truncation level that is sufficiently precise for our purpose. △ Less

Submitted 7 February, 2015; v1 submitted 13 March, 2013; originally announced March 2013.

Comments: 21 pages

arXiv:1107.1404 [pdf, other]

Multiscale Methods for Shape Constraints in Deconvolution: Confidence Statements for Qualitative Features

Authors: Johannes Schmidt-Hieber, Axel Munk, Lutz Duembgen

Abstract: We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing… ▽ More We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow. △ Less

Submitted 17 December, 2012; v1 submitted 7 July, 2011; originally announced July 2011.

Comments: 55 pages, 5 figures, This is a revised version of a previous paper with the title: "Multiscale Methods for Shape Constraints in Deconvolution"

MSC Class: 62G10 (Primary) 62G15; 62G20 (Secondary)

arXiv:0908.3163 [pdf, other]

Nonparametric estimation of the volatility function in a high-frequency model corrupted by noise

Authors: Axel Munk, Johannes Schmidt-Hieber

Abstract: We consider the models Y_{i,n}=\int_0^{i/n} σ(s)dW_s+τ(i/n)ε_{i,n}, and \tilde Y_{i,n}=σ(i/n)W_{i/n}+τ(i/n)ε_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and ε_{i,n} are centered i.i.d. random variables with E(ε_{i,n}^2)=1 and finite fourth moment. Furthermore, σand τare unknown deterministic functions and W_t and (ε_{1,n},...,ε_{n,n}) are assumed to be independent processes. Bas… ▽ More We consider the models Y_{i,n}=\int_0^{i/n} σ(s)dW_s+τ(i/n)ε_{i,n}, and \tilde Y_{i,n}=σ(i/n)W_{i/n}+τ(i/n)ε_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and ε_{i,n} are centered i.i.d. random variables with E(ε_{i,n}^2)=1 and finite fourth moment. Furthermore, σand τare unknown deterministic functions and W_t and (ε_{1,n},...,ε_{n,n}) are assumed to be independent processes. Based on a spectral decomposition of the covariance structures we derive series estimators for σ^2 and τ^2 and investigate their rate of convergence of the MISE in dependence of their smoothness. To this end specific basis functions and their corresponding Sobolev ellipsoids are introduced and we show that our estimators are optimal in minimax sense. Our work is motivated by microstructure noise models. Our major finding is that the microstructure noise ε_{i,n} introduces an additionally degree of ill-posedness of 1/2; irrespectively of the tail behavior of ε_{i,n}. The method is illustrated by a small numerical study. △ Less

Submitted 6 April, 2010; v1 submitted 21 August, 2009; originally announced August 2009.

Comments: 5 figures, corrected references, minor changes

Showing 1–20 of 20 results for author: Schmidt-Hieber, J