-
Transient performance of MPC for tracking without terminal constraints
Authors:
Nadine Ehmann,
Matthias Köhler,
Frank Allgöwer
Abstract:
Model predictive control (MPC) for tracking is a recently introduced approach, which extends standard MPC formulations by incorporating an artificial reference as an additional optimization variable, in order to track external and potentially time-varying references. In this work, we analyze the performance of such an MPC for tracking scheme without a terminal cost and terminal constraints. We der…
▽ More
Model predictive control (MPC) for tracking is a recently introduced approach, which extends standard MPC formulations by incorporating an artificial reference as an additional optimization variable, in order to track external and potentially time-varying references. In this work, we analyze the performance of such an MPC for tracking scheme without a terminal cost and terminal constraints. We derive a transient performance estimate, i.e. a bound on the closed-loop performance over an arbitrary time interval, yielding insights on how to select the scheme's parameters for performance. Furthermore, we show that in the asymptotic case, where the prediction horizon and observed time interval tend to infinity, the closed-loop solution of MPC for tracking recovers the infinite horizon optimal solution.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Statistically guided deep learning
Authors:
Michael Kohler,
Adam Krzyzak
Abstract:
We present a theoretically well-founded deep learning algorithm for nonparametric regression. It uses over-parametrized deep neural networks with logistic activation function, which are fitted to the given data via gradient descent. We propose a special topology of these networks, a special random initialization of the weights, and a data-dependent choice of the learning rate and the number of gra…
▽ More
We present a theoretically well-founded deep learning algorithm for nonparametric regression. It uses over-parametrized deep neural networks with logistic activation function, which are fitted to the given data via gradient descent. We propose a special topology of these networks, a special random initialization of the weights, and a data-dependent choice of the learning rate and the number of gradient descent steps. We prove a theoretical bound on the expected $L_2$ error of this estimate, and illustrate its finite sample size performance by applying it to simulated data. Our results show that a theoretical analysis of deep learning which takes into account simultaneously optimization, generalization and approximation can result in a new deep learning estimate which has an improved finite sample performance.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
On the rate of convergence of an over-parametrized deep neural network regression estimate learned by gradient descent
Authors:
Michael Kohler
Abstract:
Nonparametric regression with random design is considered.
The $L_2$ error with integration with respect to the design
measure is used as the error criterion.
An over-parametrized deep neural network
regression estimate
with logistic activation function
is defined, where all weights are learned
by gradient descent. It is shown that the estimate
achieves a nearly optimal rate of con…
▽ More
Nonparametric regression with random design is considered.
The $L_2$ error with integration with respect to the design
measure is used as the error criterion.
An over-parametrized deep neural network
regression estimate
with logistic activation function
is defined, where all weights are learned
by gradient descent. It is shown that the estimate
achieves a nearly optimal rate of convergence in case
that the regression function is $(p,C)$--smooth.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Quasiconvex relaxation of planar Biot-type energies and the role of determinant constraints
Authors:
Robert J. Martin,
Ionel-Dumitrel Ghiba,
Maximilian Köhler,
Daniel Balzani,
Oliver Sander,
Patrizio Neff
Abstract:
We derive the quasiconvex relaxation of the Biot-type energy density $\lVert\sqrt{\operatorname{D}\varphi^T \operatorname{D}\varphi}-I_2\rVert^2$ for planar mappings $\varphi\colon\mathbb{R}^2\to \mathbb{R}^2$ in two different scenarios. First, we consider the case $\operatorname{D}\varphi\in\textrm{GL}^+(2)$, in which the energy can be expressed as the squared Euclidean distance…
▽ More
We derive the quasiconvex relaxation of the Biot-type energy density $\lVert\sqrt{\operatorname{D}\varphi^T \operatorname{D}\varphi}-I_2\rVert^2$ for planar mappings $\varphi\colon\mathbb{R}^2\to \mathbb{R}^2$ in two different scenarios. First, we consider the case $\operatorname{D}\varphi\in\textrm{GL}^+(2)$, in which the energy can be expressed as the squared Euclidean distance $\operatorname{dist}^2(\operatorname{D}\varphi,\textrm{SO}(2))$ to the special orthogonal group $\textrm{SO}(2)$. We then allow for planar mappings with arbitrary $\operatorname{D}\varphi\in\mathbb{R}^{2\times 2}$; in the context of solid mechanics, this lack of determinant constraints on the deformation gradient would allow for self-interpenetration of matter. We demonstrate that the two resulting relaxations do not coincide and compare the analytical findings to numerical results for different relaxation approaches, including a rank-one sequential lamination algorithm, trust-region FEM calculations of representative microstructures and physics-informed neural networks.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization
Authors:
Michael Kohler,
Adam Krzyzak,
Alisha Sänger
Abstract:
Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible d…
▽ More
Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent. A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.
△ Less
Submitted 5 March, 2025; v1 submitted 10 April, 2024;
originally announced April 2024.
-
On the rate of convergence of an over-parametrized Transformer classifier learned by gradient descent
Authors:
Michael Kohler,
Adam Krzyzak
Abstract:
One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer network…
▽ More
One of the most recent and fascinating breakthroughs in artificial intelligence is ChatGPT, a chatbot which can simulate human conversation. ChatGPT is an instance of GPT4, which is a language model based on generative gredictive gransformers. So if one wants to study from a theoretical point of view, how powerful such artificial intelligence can be, one approach is to consider transformer networks and to study which problems one can solve with these networks theoretically. Here it is not only important what kind of models these network can approximate, or how they can generalize their knowledge learned by choosing the best possible approximation to a concrete data set, but also how well optimization of such transformer network based on concrete data set works. In this article we consider all these three different aspects simultaneously and show a theoretical upper bound on the missclassification probability of a transformer network fitted to the observed data. For simplicity we focus in this context on transformer encoder networks which can be applied to define an estimate in the context of a classification problem involving natural language.
△ Less
Submitted 20 June, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Distributed Model Predictive Control for Periodic Cooperation of Multi-Agent Systems
Authors:
Matthias Köhler,
Matthias A. Müller,
Frank Allgöwer
Abstract:
We consider multi-agent systems with heterogeneous, nonlinear agents subject to individual constraints that want to achieve a periodic, dynamic cooperative control goal which can be characterised by a set and a suitable cost. We propose a sequential distributed model predictive control (MPC) scheme in which agents sequentially solve an individual optimisation problem to track an artificial periodi…
▽ More
We consider multi-agent systems with heterogeneous, nonlinear agents subject to individual constraints that want to achieve a periodic, dynamic cooperative control goal which can be characterised by a set and a suitable cost. We propose a sequential distributed model predictive control (MPC) scheme in which agents sequentially solve an individual optimisation problem to track an artificial periodic output trajectory. The optimisation problems are coupled through these artificial periodic output trajectories, which are communicated and penalised using the cost that characterises the cooperative goal. The agents communicate only their artificial trajectories and only once per time step. We show that under suitable assumptions, the agents can incrementally move their artificial output trajectories towards the cooperative goal, and, hence, their closed-loop output trajectories asymptotically achieve it. We illustrate the scheme with a simulation example.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Transient Performance of MPC for Tracking
Authors:
Matthias Köhler,
Lisa Krügel,
Lars Grüne,
Matthias A. Müller,
Frank Allgöwer
Abstract:
We analyse the closed-loop performance of a model predictive control (MPC) for tracking formulation with artificial references. It has been shown that such a scheme guarantees closed-loop stability and recursive feasibility for any externally supplied reference, even if it is unreachable or time-varying. The basic idea is to consider an artificial reference as an additional decision variable and t…
▽ More
We analyse the closed-loop performance of a model predictive control (MPC) for tracking formulation with artificial references. It has been shown that such a scheme guarantees closed-loop stability and recursive feasibility for any externally supplied reference, even if it is unreachable or time-varying. The basic idea is to consider an artificial reference as an additional decision variable and to formulate generalised terminal ingredients with respect to it. In addition, its offset is penalised in the MPC optimisation problem, leading to closed-loop convergence to the best reachable reference. In this paper, we provide a transient performance bound on the closed loop using MPC for tracking. We employ mild assumptions on the offset cost and scale it with the prediction horizon. In this case, an increasing horizon in MPC for tracking recovers the infinite horizon optimal solution.
△ Less
Submitted 24 January, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Multidimensional rank-one convexification of incremental damage models at finite strains
Authors:
Daniel Balzani,
Maximilian Köhler,
Timo Neumeier,
Malte A. Peter,
Daniel Peterseim
Abstract:
This paper presents computationally feasible rank-one relaxation algorithms for the efficient simulation of a time-incremental damage model with nonconvex incremental stress potentials in multiple spatial dimensions. While the standard model suffers from numerical issues due to the lack of convexity, the relaxation by rank-one convexification prevents non-existence of minimizers and mesh dependenc…
▽ More
This paper presents computationally feasible rank-one relaxation algorithms for the efficient simulation of a time-incremental damage model with nonconvex incremental stress potentials in multiple spatial dimensions. While the standard model suffers from numerical issues due to the lack of convexity, the relaxation by rank-one convexification prevents non-existence of minimizers and mesh dependence of the solutions of finite element discretizations. By the combination, modification and parallelization of the underlying convexification algorithms, the novel approach becomes computationally feasible. A descent method and a Newton scheme enhanced by step-size control prevent stability issues related to local minima in the energy landscape and the computation of derivatives. Numerical techniques for the construction of continuous derivatives of the approximated rank-one convex envelope are discussed. A series of numerical experiments demonstrates the ability of the computationally relaxed model to capture softening effects and the mesh independence of the computed approximations. An interpretation in terms of microstructural damage evolution is given, based on the rank-one lamination process.
△ Less
Submitted 9 February, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent
Authors:
Michael Kohler,
Adam Krzyzak
Abstract:
Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converge…
▽ More
Estimation of a regression function from independent and identically distributed random variables is considered. The $L_2$ error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected $L_2$ error of these estimates converges to zero with the rate close to $n^{-1/(1+d)}$ in case that the regression function is Hölder smooth with Hölder exponent $p \in [1/2,1]$. In case of an interaction model where the regression function is assumed to be a sum of Hölder smooth functions where each of the functions depends only on $d^*$ many of $d$ components of the design variable, it is shown that these estimates achieve the corresponding $d^*$-dimensional rate of convergence.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Evolving Microstructures in Relaxed Continuum Damage Mechanics for Strain Softening
Authors:
Maximilian Köhler,
Daniel Balzani
Abstract:
A new relaxation approach is proposed which allows for the description of stress- and strain-softening at finite strains. The model is based on the construction of a convex hull replacing the originally non-convex incremental stress potential which in turn represents damage in terms of the classical $(1-D)$ approach. This convex hull is given as the linear convex combination of weakly and strongly…
▽ More
A new relaxation approach is proposed which allows for the description of stress- and strain-softening at finite strains. The model is based on the construction of a convex hull replacing the originally non-convex incremental stress potential which in turn represents damage in terms of the classical $(1-D)$ approach. This convex hull is given as the linear convex combination of weakly and strongly damaged phases and thus, it represents the homogenization of a microstructure bifurcated in the two phases. As a result thereof, damage evolves in the convexified regime mainly by an increasing volume fraction of the strongly damaged phase. In contrast to previous relaxed incremental formulations in Gürses and Miehe [16] and Balzani and Ortiz [2], where the convex hull has been kept fixated after construction, here, the strongly damaged phase is allowed to elastically unload upon further loading. At the same time, its volume fraction increases nonlinearly within the convexified regime. Thus, strain-softening in the sense of a decreasing stress with increasing strain can be modeled. The major advantage of the proposed approach is that it ensures mesh-independent structural simulations without the requirement of additional length-scale related parameters or nonlocal quantities, which simplifies an implementation using classical material subroutine interfaces. In this paper, focus is on the relaxation of one-dimensional models for fiber damage which are combined with a microsphere approach to allow for the description of three-dimensional fiber dispersions appearing in fibrous materials such as soft biological tissues. Several numerical examples are analyzed to show the overall response of the model and the mesh-independence of resulting structural calculations.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent
Authors:
Selina Drews,
Michael Kohler
Abstract:
Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger tha…
▽ More
Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that in case of a suitable random initialization of the network, a suitable small stepsize of the gradient descent, and a number of gradient descent steps which is slightly larger than the reciprocal of the stepsize of the gradient descent, the estimate is universally consistent in the sense that its expected L2 error converges to zero for all distributions of the data where the response variable is square integrable.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Data-driven distributed MPC of dynamically coupled linear systems
Authors:
Matthias Köhler,
Julian Berberich,
Matthias A. Müller,
Frank Allgöwer
Abstract:
In this paper, we present a data-driven distributed model predictive control (MPC) scheme to stabilise the origin of dynamically coupled discrete-time linear systems subject to decoupled input constraints. The local optimisation problems solved by the subsystems rely on a distributed adaptation of the Fundamental Lemma by Willems et al., allowing to parametrise system trajectories using only measu…
▽ More
In this paper, we present a data-driven distributed model predictive control (MPC) scheme to stabilise the origin of dynamically coupled discrete-time linear systems subject to decoupled input constraints. The local optimisation problems solved by the subsystems rely on a distributed adaptation of the Fundamental Lemma by Willems et al., allowing to parametrise system trajectories using only measured input-output data without explicit model knowledge. For the local predictions, the subsystems rely on communicated assumed trajectories of neighbours. Each subsystem guarantees a small deviation from these trajectories via a consistency constraint. We provide a theoretical analysis of the resulting non-iterative distributed MPC scheme, including proofs of recursive feasibility and (practical) stability. Finally, the approach is successfully applied to a numerical example.
△ Less
Submitted 11 August, 2023; v1 submitted 25 February, 2022;
originally announced February 2022.
-
On the rate of convergence of a classifier based on a Transformer encoder
Authors:
Iryna Gurevych,
Michael Kohler,
Gözde Gül Sahin
Abstract:
Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability…
▽ More
Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model. Furthermore, the difference between Transformer classifiers analyzed theoretically in this paper and Transformer classifiers used nowadays in practice are illustrated by considering classification problems in natural language processing.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Convergence rates for shallow neural networks learned by gradient descent
Authors:
Alina Braun,
Michael Kohler,
Sophie Langer,
Harro Walk
Abstract:
In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of…
▽ More
In this paper we analyze the $L_2$ error of neural network regression estimates with one hidden layer. Under the assumption that the Fourier transform of the regression function decays suitably fast, we show that an estimate, where all initial weights are chosen according to proper uniform distributions and where the weights are learned by gradient descent, achieves a rate of convergence of $1/\sqrt{n}$ (up to a logarithmic factor). Our statistical analysis implies that the key aspect behind this result is the proper choice of the initial inner weights and the adjustment of the outer weights via gradient descent. This indicates that we can also simply use linear least squares to choose the outer weights. We prove a corresponding theoretical result and compare our new linear least squares neural network estimate with standard neural network estimates via simulated data. Our simulations show that our theoretical considerations lead to an estimate with an improved performance in many cases.
△ Less
Submitted 18 August, 2023; v1 submitted 20 July, 2021;
originally announced July 2021.
-
Estimation of a regression function on a manifold by fully connected deep neural networks
Authors:
Michael Kohler,
Sophie Langer,
Ulrich Reif
Abstract:
Estimation of a regression function from independent and identically distributed data is considered. The $L_2$ error with integration with respect to the distribution of the predictor variable is used as the error criterion. The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed for smooth regression func…
▽ More
Estimation of a regression function from independent and identically distributed data is considered. The $L_2$ error with integration with respect to the distribution of the predictor variable is used as the error criterion. The rate of convergence of least squares estimates based on fully connected spaces of deep neural networks with ReLU activation function is analyzed for smooth regression functions. It is shown that in case that the distribution of the predictor variable is concentrated on a manifold, these estimates achieve a rate of convergence which depends on the dimension of the manifold and not on the number of components of the predictor variable.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
On the density estimation problem for uncertainty propagation with unknown input distributions
Authors:
Sebastian Kersting,
Michael Kohler
Abstract:
In this article we study the problem of quantifying the uncertainty in an experiment with a technical system. We propose new density estimates which combine observed data of the technical system and simulated data from an (imperfect) simulation model based on estimated input distributions. We analyze the rate of convergence of these estimates. The finite sample size performance of the estimates is…
▽ More
In this article we study the problem of quantifying the uncertainty in an experiment with a technical system. We propose new density estimates which combine observed data of the technical system and simulated data from an (imperfect) simulation model based on estimated input distributions. We analyze the rate of convergence of these estimates. The finite sample size performance of the estimates is illustrated by applying them to simulated data. The practical usefulness of the newly proposed estimates is demonstrated by using them to predict the uncertainty of a lateral vibration attenuation system with piezo-elastic supports.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss under the hierarchical max-pooling model
Authors:
Michael Kohler,
Sophie Langer
Abstract:
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper,…
▽ More
Convolutional neural networks (CNNs) trained with cross-entropy loss have proven to be extremely successful in classifying images. In recent years, much work has been done to also improve the theoretical understanding of neural networks. Nevertheless, it seems limited when these networks are trained with cross-entropy loss, mainly because of the unboundedness of the target function. In this paper, we aim to fill this gap by analyzing the rate of the excess risk of a CNN classifier trained by cross-entropy loss. Under suitable assumptions on the smoothness and structure of the a posteriori probability, it is shown that these classifiers achieve a rate of convergence which is independent of the dimension of the image. These rates are in line with the practical observations about CNNs.
△ Less
Submitted 29 April, 2024; v1 submitted 27 November, 2020;
originally announced November 2020.
-
Matrix Equations, Sparse Solvers: M-M.E.S.S.-2.0.1 -- Philosophy, Features and Application for (Parametric) Model
Authors:
Peter Benner,
Martin Köhler,
Jens Saak
Abstract:
Matrix equations are omnipresent in (numerical) linear algebra and systems theory. Especially in model order reduction (MOR) they play a key role in many balancing based reduction methods for linear dynamical systems. When these systems arise from spatial discretizations of evolutionary partial differential equations, their coefficient matrices are typically large and sparse. Moreover, the numbers…
▽ More
Matrix equations are omnipresent in (numerical) linear algebra and systems theory. Especially in model order reduction (MOR) they play a key role in many balancing based reduction methods for linear dynamical systems. When these systems arise from spatial discretizations of evolutionary partial differential equations, their coefficient matrices are typically large and sparse. Moreover, the numbers of inputs and outputs of these systems are typically far smaller than the number of spatial degrees of freedom. Then, in many situations the solutions of the corresponding large-scale matrix equations are observed to have low (numerical) rank. This feature is exploited by M-M.E.S.S. to find successively larger low-rank factorizations approximating the solutions. This contribution describes the basic philosophy behind the implementation and the features of the package, as well as its application in the model order reduction of large-scale linear time-invariant (LTI) systems and parametric LTI systems.
△ Less
Submitted 9 May, 2020; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Analysis of the rate of convergence of neural network regression estimates which are easy to implement
Authors:
Alina Braun,
Michael Kohler,
Adam Krzyzak
Abstract:
Recent results in nonparametric regression show that for deep learning, i.e., for neural network estimates with many hidden layers, we are able to achieve good rates of convergence even in case of high-dimensional predictor variables, provided suitable assumptions on the structure of the regression function are imposed. The estimates are defined by minimizing the empirical $L_2$ risk over a class…
▽ More
Recent results in nonparametric regression show that for deep learning, i.e., for neural network estimates with many hidden layers, we are able to achieve good rates of convergence even in case of high-dimensional predictor variables, provided suitable assumptions on the structure of the regression function are imposed. The estimates are defined by minimizing the empirical $L_2$ risk over a class of neural networks. In practice it is not clear how this can be done exactly. In this article we introduce a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement. We show that for this estimate we can derive rates of convergence results in case the regression function is smooth. We combine this estimate with the projection pursuit, where we choose the directions randomly, and we show that for sufficiently many repititions we get a neural network regression estimate which is easy to implement and which achieves the one-dimensional rate of convergence (up to some logarithmic factor) in case that the regression function satisfies the assumptions of projection pursuit.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Over-parametrized deep neural networks do not generalize well
Authors:
Michael Kohler,
Adam Krzyzak
Abstract:
Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not genera…
▽ More
Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that they do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.
△ Less
Submitted 14 January, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
On the rate of convergence of a neural network regression estimate learned by gradient descent
Authors:
Alina Braun,
Michael Kohler,
Harro Walk
Abstract:
Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient descent procedure is repeated several times with randomly chosen starting values for the weights, and from the list of constructed estimates the one with the mini…
▽ More
Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient descent procedure is repeated several times with randomly chosen starting values for the weights, and from the list of constructed estimates the one with the minimal empirical $L_2$ risk is chosen. Under the assumption that the number of randomly chosen starting values and the number of steps for gradient descent are sufficiently large it is shown that the resulting estimate achieves (up to a logarithmic factor) the optimal rate of convergence in a projection pursuit model. The final sample size performance of the estimates is illustrated by using simulated data.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Estimation of a function of low local dimensionality by deep neural networks
Authors:
Michael Kohler,
Adam Krzyzak,
Sophie Langer
Abstract:
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to co…
▽ More
Deep neural networks (DNNs) achieve impressive results for complicated tasks like object detection on images and speech recognition. Motivated by this practical success, there is now a strong interest in showing good theoretical properties of DNNs. To describe for which tasks DNNs perform well and when they fail, it is a key challenge to understand their performance. The aim of this paper is to contribute to the current statistical theory of DNNs. We apply DNNs on high dimensional data and we show that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality. Consequently, the rate of convergence of the estimate does not depend on its input dimension $d$, but on its local dimension $d^*$ and the DNNs are able to circumvent the curse of dimensionality in case that $d^*$ is much smaller than $d$. In our simulation study we provide numerical experiments to support our theoretical result and we compare our estimate with other conventional nonparametric regression estimates. The performance of our estimates is also validated in experiments with real data.
△ Less
Submitted 15 June, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
On the rate of convergence of fully connected very deep neural network regression estimates
Authors:
Michael Kohler,
Sophie Langer
Abstract:
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely t…
▽ More
Recent results in nonparametric regression show that deep learning, i.e., neural network estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that their network architecture has a further constraint, namely the network sparsity. In this paper we show that we can get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions. Here either the number of neurons per hidden layer is fixed and the number of hidden layers tends to infinity suitably fast for sample size tending to infinity, or the number of hidden layers is bounded by some logarithmic factor in the sample size and the number of neurons per hidden layer tends to infinity suitably fast for sample size tending to infinity. The proof is based on new approximation results concerning deep neural networks.
△ Less
Submitted 29 September, 2020; v1 submitted 29 August, 2019;
originally announced August 2019.
-
Sub-sampled Cubic Regularization for Non-convex Optimization
Authors:
Jonas Moritz Kohler,
Aurelien Lucchi
Abstract:
We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods.…
▽ More
We consider the minimization of non-convex functions that typically arise in machine learning. Specifically, we focus our attention on a variant of trust region methods known as cubic regularization. This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods. However, it suffers from a high computational complexity that makes it impractical for large-scale learning. Here, we propose a novel method that uses sub-sampling to lower this computational cost. By the use of concentration inequalities we provide a sampling scheme that gives sufficiently accurate gradient and Hessian approximations to retain the strong global and local convergence guarantees of cubically regularized methods. To the best of our knowledge this is the first work that gives global convergence guarantees for a sub-sampled variant of cubic regularization on non-convex functions. Furthermore, we provide experimental results supporting our theory.
△ Less
Submitted 1 July, 2017; v1 submitted 16 May, 2017;
originally announced May 2017.
-
On data-based optimal stopping under stationarity and ergodicity
Authors:
Michael Kohler,
Harro Walk
Abstract:
The problem of optimal stopping with finite horizon in discrete time is considered in view of maximizing the expected gain. The algorithm proposed in this paper is completely nonparametric in the sense that it uses observed data from the past of the process up to time $-n+1$, $n\in\mathbb{N}$, not relying on any specific model assumption. Kernel regression estimation of conditional expectations an…
▽ More
The problem of optimal stopping with finite horizon in discrete time is considered in view of maximizing the expected gain. The algorithm proposed in this paper is completely nonparametric in the sense that it uses observed data from the past of the process up to time $-n+1$, $n\in\mathbb{N}$, not relying on any specific model assumption. Kernel regression estimation of conditional expectations and prediction theory of individual sequences are used as tools. It is shown that the algorithm is universally consistent: the achieved expected gain converges to the optimal value for $n\to\infty$ whenever the underlying process is stationary and ergodic. An application to exercising American options is given, and the algorithm is illustrated by simulated data.
△ Less
Submitted 23 July, 2013;
originally announced July 2013.
-
Optimal $RH_2$-- and $RH_\infty$--Approximation of Unstable Descriptor Systems
Authors:
Marcus Köhler
Abstract:
Stability perserving is an important topic in approximation of systems, e.g.\ model reduction. If the original system is stable, we often want the approximation to be stable. But even if an algorithm preserves stability the resulting system could be unstable in practice because of round-off errors. Our approach is approximating this unstable reduced system by a stable system. More precisely, we co…
▽ More
Stability perserving is an important topic in approximation of systems, e.g.\ model reduction. If the original system is stable, we often want the approximation to be stable. But even if an algorithm preserves stability the resulting system could be unstable in practice because of round-off errors. Our approach is approximating this unstable reduced system by a stable system. More precisely, we consider the following problem. Given an unstable linear time-invariant continuous-time descriptor system with transfer function $G$, find a stable one whose transfer function is the best approximation of $G$ in the spaces $RH_2$ and $RH_\infty$, respectively. Explicit optimal solutions are presented under consideration of numerical issues.
△ Less
Submitted 1 August, 2012; v1 submitted 27 March, 2012;
originally announced March 2012.
-
Universal coefficient theorems for C*-algebras over finite topological spaces
Authors:
Rasmus Bentmann,
Manuel Köhler
Abstract:
We determine the class of finite T_0-spaces allowing for a universal coefficient theorem computing equivariant KK-theory by filtrated K-theory.
We determine the class of finite T_0-spaces allowing for a universal coefficient theorem computing equivariant KK-theory by filtrated K-theory.
△ Less
Submitted 7 April, 2011; v1 submitted 29 January, 2011;
originally announced January 2011.
-
A dynamic look-ahead Monte Carlo algorithm for pricing Bermudan options
Authors:
Daniel Egloff,
Michael Kohler,
Nebojsa Todorovic
Abstract:
Under the assumption of no-arbitrage, the pricing of American and Bermudan options can be casted into optimal stopping problems. We propose a new adaptive simulation based algorithm for the numerical solution of optimal stopping problems in discrete time. Our approach is to recursively compute the so-called continuation values. They are defined as regression functions of the cash flow, which wou…
▽ More
Under the assumption of no-arbitrage, the pricing of American and Bermudan options can be casted into optimal stopping problems. We propose a new adaptive simulation based algorithm for the numerical solution of optimal stopping problems in discrete time. Our approach is to recursively compute the so-called continuation values. They are defined as regression functions of the cash flow, which would occur over a series of subsequent time periods, if the approximated optimal exercise strategy is applied. We use nonparametric least squares regression estimates to approximate the continuation values from a set of sample paths which we simulate from the underlying stochastic process. The parameters of the regression estimates and the regression problems are chosen in a data-dependent manner. We present results concerning the consistency and rate of convergence of the new algorithm. Finally, we illustrate its performance by pricing high-dimensional Bermudan basket options with strangle-spread payoff based on the average of the underlying assets.
△ Less
Submitted 19 October, 2007;
originally announced October 2007.