Skip to main content

Showing 1–13 of 13 results for author: Krzyzak, A

.
  1. arXiv:2504.08489  [pdf, other

    math.ST cs.LG stat.ML

    Statistically guided deep learning

    Authors: Michael Kohler, Adam Krzyzak

    Abstract: We present a theoretically well-founded deep learning algorithm for nonparametric regression. It uses over-parametrized deep neural networks with logistic activation function, which are fitted to the given data via gradient descent. We propose a special topology of these networks, a special random initialization of the weights, and a data-dependent choice of the learning rate and the number of gra… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2504.03405

  2. arXiv:2405.07619  [pdf, ps, other

    stat.ML cs.LG

    Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent

    Authors: Michael Kohler, Adam Krzyzak, Benjamin Walter

    Abstract: Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered. The weights of the network are learned by gradient descent. A bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate and the minimal possible value is derived.

    Submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2404.07128  [pdf, ps, other

    math.ST

    Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization

    Authors: Michael Kohler, Adam Krzyzak, Alisha Sänger

    Abstract: Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible d… ▽ More

    Submitted 5 March, 2025; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.17007

  4. arXiv:2312.17007  [pdf, ps, other

    cs.LG math.ST stat.ML

    On the rate of convergence of an over-parametrized Transformer classifier learned by gradient descent

    Authors: Michael Kohler, Adam Krzyzak

    Abstract: One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer network… ▽ More

    Submitted 20 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  5. arXiv:2210.01443  [pdf, ps, other

    math.ST

    Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

    Authors: Michael Kohler, Adam Krzyzak

    Abstract: Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converge… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  6. arXiv:2011.00328  [pdf, other

    stat.ML cs.LG

    On the rate of convergence of a deep recurrent neural network estimate in a regression problem with dependent data

    Authors: Michael Kohler, Adam Krzyzak

    Abstract: A regression problem with dependent data is considered. Regularity assumptions on the dependency of the data are introduced, and it is shown that under suitable structural assumptions on the regression function a deep recurrent neural network estimate is able to circumvent the curse of dimensionality.

    Submitted 31 October, 2020; originally announced November 2020.

  7. arXiv:2003.01526  [pdf, other

    stat.ML cs.LG

    On the rate of convergence of image classifiers based on convolutional neural networks

    Authors: M. Kohler, A. Krzyzak, B. Walter

    Abstract: Image classifiers based on convolutional neural networks are defined, and the rate of convergence of the misclassification risk of the estimates towards the optimal misclassification risk is analyzed. Under suitable assumptions on the smoothness and structure of the aposteriori probability a rate of convergence is shown which is independent of the dimension of the image. This proves that in image… ▽ More

    Submitted 14 October, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

  8. arXiv:1912.05436  [pdf, ps, other

    math.ST

    Analysis of the rate of convergence of neural network regression estimates which are easy to implement

    Authors: Alina Braun, Michael Kohler, Adam Krzyzak

    Abstract: Recent results in nonparametric regression show that for deep learning, i.e., for neural network estimates with many hidden layers, we are able to achieve good rates of convergence even in case of high-dimensional predictor variables, provided suitable assumptions on the structure of the regression function are imposed. The estimates are defined by minimizing the empirical $L_2$ risk over a class… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: arXiv admin note: text overlap with arXiv:1912.03921

  9. arXiv:1912.03925  [pdf, ps, other

    math.ST

    Over-parametrized deep neural networks do not generalize well

    Authors: Michael Kohler, Adam Krzyzak

    Abstract: Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not genera… ▽ More

    Submitted 14 January, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

  10. arXiv:1908.11140  [pdf, ps, other

    stat.ML cs.LG math.ST

    Estimation of a function of low local dimensionality by deep neural networks

    Authors: Michael Kohler, Adam Krzyzak, Sophie Langer

    Abstract: Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to co… ▽ More

    Submitted 15 June, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

  11. Discretization of quaternionic continuous wavelet transforms

    Authors: A. Askari Hemmat, K. Thirulogasanthar, A. Krzyzak

    Abstract: A scheme to form a basis and a frame for a Hilbert space of quaternion valued square integrable function from a basis and a frame, respectively, of a Hilbert space of complex valued square integrable functions is introduced. Using the discretization techniques for 2D-continuous wavelet transform of the $SIM(2)$ group, the quaternionic continuous wavelet transform, living in a complex valued Hilber… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: 18 pages. arXiv admin note: text overlap with arXiv:1402.3109

    MSC Class: 81R30; 42C40; 42C15

    Journal ref: J. Geom. Phys., 117 (2017), 36-49

  12. arXiv:1201.0586  [pdf, ps, other

    math.ST

    An Affine Invariant $k$-Nearest Neighbor Regression Estimate

    Authors: Gérard Biau, Luc Devroye, Vida Dujmovic, Adam Krzyzak

    Abstract: We design a data-dependent metric in $\mathbb R^d$ and use it to define the $k$-nearest neighbors of a given point. Our metric is invariant under all affine transformations. We show that, with this metric, the standard $k$-nearest neighbor regression estimate is asymptotically consistent under the usual conditions on $k$, and minimal requirements on the input data.

    Submitted 18 May, 2012; v1 submitted 3 January, 2012; originally announced January 2012.

  13. Multi Matrix Vector Coherent States

    Authors: K. Thirulogasanthar, G. Honnouvo, A. Krzyzak

    Abstract: A class of vector coherent states is derived with multiple of matrices as vectors in a Hilbert space, where the Hilbert space is taken to be the tensor product of several other Hilbert spaces. As examples vector coherent states with multiple of quaternions and octonions are given. The resulting generalized oscillator algebra is briefly discussed. Further, vector coherent states for a tensored Ha… ▽ More

    Submitted 2 September, 2004; v1 submitted 29 August, 2003; originally announced August 2003.

    Comments: 24 pages

    MSC Class: 82R30

    Journal ref: Ann. Physics. 314 (2004) 119-144.