Search | arXiv e-print repository

Asymptotic Analysis of Deep Residual Networks

Authors: Rama Cont, Alain Rossier, Renyuan Xu

Abstract: We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential… ▽ More We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics. △ Less

Submitted 25 January, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: 49 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2105.12245

MSC Class: 60F17; 60F25; 68T05

arXiv:2204.07261 [pdf, other]

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Authors: Rama Cont, Alain Rossier, RenYuan Xu

Abstract: We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss func… ▽ More We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index, admit a scaling limit as the depth increases, then the limit has finite $p-$variation with $p=2$. Proofs are based on non-asymptotic estimates for the loss function and for norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems. △ Less

Submitted 25 January, 2023; v1 submitted 14 April, 2022; originally announced April 2022.

MSC Class: 65Kxx; 62M45; 68Q32; 68Txx

arXiv:2105.12245 [pdf, other]

Scaling Properties of Deep Residual Networks

Authors: Alain-Sam Cohen, Rama Cont, Alain Rossier, Renyuan Xu

Abstract: Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stocha… ▽ More Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature. Depending on certain features of the network architecture, such as the smoothness of the activation function, one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit. △ Less

Submitted 10 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Published at ICML 2021

arXiv:1501.00014 [pdf, ps, other]

Optimal rounding under integer constraints

Authors: Rama Cont, Massoud Heidari

Abstract: Given real numbers whose sum is an integer, we study the problem of finding integers which match these real numbers as closely as possible, in the sense of L^p norm, while preserving the sum. We describe the structure of solutions for this integer optimization problem and propose an algorithm with complexity O(N log N) for solving it. In contrast to fractional rounding and randomized rounding, whi… ▽ More Given real numbers whose sum is an integer, we study the problem of finding integers which match these real numbers as closely as possible, in the sense of L^p norm, while preserving the sum. We describe the structure of solutions for this integer optimization problem and propose an algorithm with complexity O(N log N) for solving it. In contrast to fractional rounding and randomized rounding, which yield biased estimators of the solution when applied to this problem, our method yields an exact solution which minimizes the relative rounding error across the set of all solutions for any value of p greater than 1, while avoiding the complexity of exhaustive search. The proposed algorithm also solves a class of integer optimization problems with integer constraints and may be used as the rounding step of relaxed integer programming problems, for rounding real-valued solutions. We give several examples of applications for the proposed algorithm. △ Less

Submitted 30 December, 2014; originally announced January 2015.

MSC Class: 90C10; 90C27 ACM Class: G.1.6

Showing 1–4 of 4 results for author: Cont, R