-
Asymptotic Analysis of Deep Residual Networks
Authors:
Rama Cont,
Alain Rossier,
Renyuan Xu
Abstract:
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential…
▽ More
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.
△ Less
Submitted 25 January, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks
Authors:
Rama Cont,
Alain Rossier,
RenYuan Xu
Abstract:
We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss func…
▽ More
We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss function and for norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.
△ Less
Submitted 25 January, 2023; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Scaling Properties of Deep Residual Networks
Authors:
Alain-Sam Cohen,
Rama Cont,
Alain Rossier,
Renyuan Xu
Abstract:
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stocha…
▽ More
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit.
△ Less
Submitted 10 June, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Optimal rounding under integer constraints
Authors:
Rama Cont,
Massoud Heidari
Abstract:
Given real numbers whose sum is an integer, we study the problem of finding integers which match these real numbers as closely as possible, in the sense of L^p norm, while preserving the sum. We describe the structure of solutions for this integer optimization problem and propose an algorithm with complexity O(N log N) for solving it. In contrast to fractional rounding and randomized rounding, whi…
▽ More
Given real numbers whose sum is an integer, we study the problem of finding integers which match these real numbers as closely as possible, in the sense of L^p norm, while preserving the sum. We describe the structure of solutions for this integer optimization problem and propose an algorithm with complexity O(N log N) for solving it. In contrast to fractional rounding and randomized rounding, which yield biased estimators of the solution when applied to this problem, our method yields an exact solution which minimizes the relative rounding error across the set of all solutions for any value of p greater than 1, while avoiding the complexity of exhaustive search. The proposed algorithm also solves a class of integer optimization problems with integer constraints and may be used as the rounding step of relaxed integer programming problems, for rounding real-valued solutions. We give several examples of applications for the proposed algorithm.
△ Less
Submitted 30 December, 2014;
originally announced January 2015.