Skip to main content

Showing 1–20 of 20 results for author: Schmidt-Hieber, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.00110  [pdf, ps, other

    stat.ML cs.LG math.NA

    On the expressivity of deep Heaviside networks

    Authors: Insung Kong, Juntong Chen, Sophie Langer, Johannes Schmidt-Hieber

    Abstract: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regre… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: 61 pages, 16 figures

  2. arXiv:2503.11891  [pdf, other

    cs.LG math.ST stat.ML

    Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

    Authors: Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

    Abstract: We analyze the landscape and training dynamics of diagonal linear networks in a linear regression task, with the network parameters being perturbed by small isotropic normal noise. The addition of such noise may be interpreted as a stochastic form of sharpness-aware minimization (SAM) and we prove several results that relate its action on the underlying landscape and training dynamics to the sharp… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 54 pages, 3 figures

  3. arXiv:2410.20068  [pdf, other

    cs.LG math.ST stat.ML

    Understanding the Effect of GCN Convolutions in Regression Tasks

    Authors: Juntong Chen, Johannes Schmidt-Hieber, Claire Donnat, Olga Klopp

    Abstract: Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes e… ▽ More

    Submitted 16 April, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

    Comments: 25 pages

    MSC Class: 62G08; 68R10

  4. arXiv:2410.15800  [pdf, ps, other

    cs.LG math.ST stat.ML

    On the VC dimension of deep group convolutional neural networks

    Authors: Anna Sepliarskaia, Sophie Langer, Johannes Schmidt-Hieber

    Abstract: We study the generalization capabilities of Group Convolutional Neural Networks (GCNNs) with ReLU activation function by deriving upper and lower bounds for their Vapnik-Chervonenkis (VC) dimension. Specifically, we analyze how factors such as the number of layers, weights, and input dimension affect the VC dimension. We further compare the derived bounds to those known for other types of neural n… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  5. arXiv:2409.07434  [pdf, other

    stat.ML cs.LG math.ST

    Asymptotics of Stochastic Gradient Descent with Dropout Regularization in Linear Models

    Authors: Jiaqi Li, Johannes Schmidt-Hieber, Wei Biao Wu

    Abstract: This paper proposes an asymptotic theory for online inference of the stochastic gradient descent (SGD) iterates with dropout regularization in linear regression. Specifically, we establish the geometric-moment contraction (GMC) for constant step-size SGD dropout iterates to show the existence of a unique stationary distribution of the dropout recursive function. By the GMC property, we provide que… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 77 pages, 5 figures, 4 tables

    MSC Class: 62E20; 62F12; 68W27

  6. arXiv:2306.10529  [pdf, other

    math.ST stat.ML

    Dropout Regularization Versus $\ell_2$-Penalization in the Linear Model

    Authors: Gabriel Clara, Sophie Langer, Johannes Schmidt-Hieber

    Abstract: We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for the convergence of expectations and covariance matrices of the iterates are derived. The results shed more light on the widely cited connection between dropout and l2-regularization in the linear model. We indicate a more subtle relationship, ow… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 52 pages, 2 figures

    Journal ref: Journal of Machine Learning Research, 25(204), 2024

  7. arXiv:2205.07764  [pdf, ps, other

    stat.ML cs.LG math.ST

    On the inability of Gaussian process regression to optimally learn compositional functions

    Authors: Matteo Giordano, Kolyan Ray, Johannes Schmidt-Hieber

    Abstract: We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on… ▽ More

    Submitted 27 September, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

    Comments: 20 pages, to appear in Advances in Neural Information Processing Systems 36 (NeurIPS 2022)

  8. On generalization bounds for deep networks based on loss surface implicit regularization

    Authors: Masaaki Imaizumi, Johannes Schmidt-Hieber

    Abstract: The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by sto… ▽ More

    Submitted 16 October, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

    Comments: To appear in IEEE Transaction on Information Theory

  9. arXiv:2007.15884  [pdf, other

    cs.LG cs.NE stat.ML

    The Kolmogorov-Arnold representation theorem revisited

    Authors: Johannes Schmidt-Hieber

    Abstract: There is a longstanding debate whether the Kolmogorov-Arnold representation theorem can explain the use of more than one hidden layer in neural networks. The Kolmogorov-Arnold representation decomposes a multivariate function into an interior and an outer function and therefore has indeed a similar structure as a neural network with two hidden layers. But there are distinctive differences. One of… ▽ More

    Submitted 2 January, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

    Comments: 21 pages

    MSC Class: 41A30

  10. arXiv:2006.00278  [pdf, ps, other

    math.ST stat.ML

    On lower bounds for the bias-variance trade-off

    Authors: Alexis Derumigny, Johannes Schmidt-Hieber

    Abstract: It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a… ▽ More

    Submitted 20 March, 2023; v1 submitted 30 May, 2020; originally announced June 2020.

    Comments: 52 pages, 2 figures, 1 table

    MSC Class: 62G05; 62C05; 62C20

  11. arXiv:1908.00695  [pdf, other

    stat.ML cs.LG

    Deep ReLU network approximation of functions on a manifold

    Authors: Johannes Schmidt-Hieber

    Abstract: Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a Hölder function with s… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

  12. arXiv:1804.02253  [pdf, other

    stat.ML cs.LG stat.ME

    A comparison of deep networks with ReLU activation function and linear spline-type methods

    Authors: Konstantin Eckle, Johannes Schmidt-Hieber

    Abstract: Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline met… ▽ More

    Submitted 24 September, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

    MSC Class: 62G08 (Primary); 62G20 (Secondary)

  13. arXiv:1708.06633  [pdf, other

    math.ST cs.LG stat.ML

    Nonparametric regression using deep neural networks with ReLU activation function

    Authors: Johannes Schmidt-Hieber

    Abstract: Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constrain… ▽ More

    Submitted 13 September, 2020; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: article, rejoinder and supplementary material

    MSC Class: 62G08

    Journal ref: Article: Annals of Statistics, Volume 48, Number 4, 1875-1897, 2020, Rejoinder: Annals of Statistics, Volume 48, Number 4, 1916-1921, 2020

  14. arXiv:1704.01066  [pdf, other

    stat.ME econ.EM

    Tests for qualitative features in the random coefficients model

    Authors: Fabian Dunker, Konstantin Eckle, Katharina Proksch, Johannes Schmidt-Hieber

    Abstract: The random coefficients model is an extension of the linear regression model that allows for unobserved heterogeneity in the population by modeling the regression coefficients as random variables. Given data from this model, the statistical challenge is to recover information about the joint density of the random coefficients which is a multivariate and ill-posed problem. Because of the curse of d… ▽ More

    Submitted 13 March, 2018; v1 submitted 4 April, 2017; originally announced April 2017.

    MSC Class: 62G10; 62G15; 62G20

  15. Minimax theory for a class of non-linear statistical inverse problems

    Authors: Kolyan Ray, Johannes Schmidt-Hieber

    Abstract: We study a class of statistical inverse problems with non-linear pointwise operators motivated by concrete statistical applications. A two-step procedure is proposed, where the first step smoothes the data and inverts the non-linearity. This reduces the initial non-linear problem to a linear inverse problem with deterministic noise, which is then solved in a second step. The noise reduction step i… ▽ More

    Submitted 11 May, 2016; v1 submitted 1 December, 2015; originally announced December 2015.

    Comments: 37 pages

    MSC Class: 62G05 (Primary); 62G08; 62G20 (Secondary)

    Journal ref: Inverse Problems 32 (2016) 065003

  16. arXiv:1403.0735  [pdf, ps, other

    math.ST stat.ME

    Bayesian linear regression with sparse priors

    Authors: Ismaël Castillo, Johannes Schmidt-Hieber, Aad van der Vaart

    Abstract: We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It… ▽ More

    Submitted 14 October, 2015; v1 submitted 4 March, 2014; originally announced March 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOS1334 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1334

    Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 1986-2018

  17. arXiv:1309.6178  [pdf, ps, other

    stat.AP

    Spot volatility estimation for high-frequency data: adaptive estimation in practice

    Authors: Till Sabel, Johannes Schmidt-Hieber, Axel Munk

    Abstract: We develop further the spot volatility estimator introduced in Hoffmann, Munk and Schmidt-Hieber (2012) from a practical point of view and make it useful for the analysis of high-frequency financial data. In a first part, we adjust the estimator substantially in order to achieve good finite sample performance and to overcome difficulties arising from violations of the additive microstructure noise… ▽ More

    Submitted 24 September, 2013; originally announced September 2013.

    MSC Class: 91B84; 62G08; 65T60; 62M99

  18. arXiv:1303.3118  [pdf, ps, other

    math.ST stat.ME

    On an estimator achieving the adaptive rate in nonparametric regression under $L^p$-loss for all $1\leq p \leq \infty$

    Authors: Johannes Schmidt-Hieber

    Abstract: Consider nonparametric function estimation under $L^p$-loss. The minimax rate for estimation of the regression function over a Hölder ball with smoothness index $β$ is $n^{-β/(2β+1)}$ if $1\leq p<\infty$ and $(n/\log n)^{-β/(2β+1)}$ if $p=\infty.$ There are many known procedures that either attain this rate for $p=\infty$ but are suboptimal by a $\log n$ factor in the case $p<\infty$ or the other… ▽ More

    Submitted 7 February, 2015; v1 submitted 13 March, 2013; originally announced March 2013.

    Comments: 21 pages

  19. arXiv:1107.1404  [pdf, other

    math.ST stat.ME

    Multiscale Methods for Shape Constraints in Deconvolution: Confidence Statements for Qualitative Features

    Authors: Johannes Schmidt-Hieber, Axel Munk, Lutz Duembgen

    Abstract: We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately ill-posed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing… ▽ More

    Submitted 17 December, 2012; v1 submitted 7 July, 2011; originally announced July 2011.

    Comments: 55 pages, 5 figures, This is a revised version of a previous paper with the title: "Multiscale Methods for Shape Constraints in Deconvolution"

    MSC Class: 62G10 (Primary) 62G15; 62G20 (Secondary)

  20. arXiv:0908.3163  [pdf, other

    stat.ME math.ST

    Nonparametric estimation of the volatility function in a high-frequency model corrupted by noise

    Authors: Axel Munk, Johannes Schmidt-Hieber

    Abstract: We consider the models Y_{i,n}=\int_0^{i/n} σ(s)dW_s+τ(i/n)ε_{i,n}, and \tilde Y_{i,n}=σ(i/n)W_{i/n}+τ(i/n)ε_{i,n}, i=1,...,n, where W_t denotes a standard Brownian motion and ε_{i,n} are centered i.i.d. random variables with E(ε_{i,n}^2)=1 and finite fourth moment. Furthermore, σand τare unknown deterministic functions and W_t and (ε_{1,n},...,ε_{n,n}) are assumed to be independent processes. Bas… ▽ More

    Submitted 6 April, 2010; v1 submitted 21 August, 2009; originally announced August 2009.

    Comments: 5 figures, corrected references, minor changes