Skip to main content

Showing 1–12 of 12 results for author: Dherin, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.13397  [pdf, other

    cs.LG math.NA stat.ML

    Learning by solving differential equations

    Authors: Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Sourabh Medapati, Javier Gonzalvo

    Abstract: Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kut… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  2. arXiv:2502.01557  [pdf, other

    cs.LG math.DS stat.ML

    Training in reverse: How iteration order influences convergence and stability in deep learning

    Authors: Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn

    Abstract: Despite exceptional achievements, training neural networks remains computationally expensive and is often plagued by instabilities that can degrade convergence. While learning rate schedules can help mitigate these issues, finding optimal schedules is time-consuming and resource-intensive. This work explores theoretical issues concerning training stability in the constant-learning-rate (i.e., with… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  3. arXiv:2405.18590  [pdf, other

    stat.ML cs.LG

    A Margin-based Multiclass Generalization Bound via Geometric Complexity

    Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo

    Abstract: There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric comple… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted as an ICML 2023 workshop paper (Topology, Algebra and Geometry in Machine Learning)

    Journal ref: Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), PMLR 221:189-205, 2023

  4. arXiv:2402.08818  [pdf, other

    stat.ML cs.LG math.OC

    Corridor Geometry in Gradient-Based Optimization

    Authors: Benoit Dherin, Mihaela Rosca

    Abstract: We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines. We show that corridors provide insights into gradient-based optimization, since corridors are exactly the regions where gradient descent and the gradient flow follow the same trajectory, while the loss decreases linearly. As a result,… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2311.00235  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Implicit biases in multitask and continual learning from a backward error analysis perspective

    Authors: Benoit Dherin

    Abstract: Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent. In particular, we derive modified losses that are implicitly minimized during training. They have three terms: the original loss, accounting for convergence, an implicit flatness regularization term proportional to the learnin… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted in Mathematics of Modern Machine Learning Workshop at NeurIPS 2023

  6. arXiv:2307.00667  [pdf, other

    stat.ML cs.AI cs.LG

    Morse Neural Networks for Uncertainty Quantification

    Authors: Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan

    Abstract: We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points. Fitting the Morse neural network via a KL-divergence loss yields 1) a (unnormalized) generative density, 2) an OOD detector, 3) a calibration temperature, 4) a… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML workshop on Structured Probabilistic Inference & Generative Modeling 2023

  7. arXiv:2302.01952  [pdf, other

    stat.ML cs.LG

    On a continuous time model of gradient descent dynamics and instability in deep learning

    Authors: Mihaela Rosca, Yan Wu, Chongli Qin, Benoit Dherin

    Abstract: The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates… ▽ More

    Submitted 13 September, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: Transactions of Machine Learning Research, 2023

  8. arXiv:2209.13083  [pdf, other

    cs.LG stat.ML

    Why neural networks find simple solutions: the many regularizers of geometric complexity

    Authors: Benoit Dherin, Michael Munn, Mihaela Rosca, David G. T. Barrett

    Abstract: In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for de… ▽ More

    Submitted 23 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted as a NeurIPS 2022 paper

  9. arXiv:2111.15090  [pdf, other

    cs.LG stat.ML

    The Geometric Occam's Razor Implicit in Deep Learning

    Authors: Benoit Dherin, Michael Munn, David G. T. Barrett

    Abstract: In over-parameterized deep neural networks there can be many possible parameter configurations that fit the training data exactly. However, the properties of these interpolating solutions are poorly understood. We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the… ▽ More

    Submitted 30 November, 2021; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted as a NeurIPS 2021 workshop paper (OPT2021)

  10. arXiv:2105.13922  [pdf, other

    stat.ML cs.LG

    Discretization Drift in Two-Player Games

    Authors: Mihaela Rosca, Yan Wu, Benoit Dherin, David G. T. Barrett

    Abstract: Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand. Part of this complexity originates from the discrete update steps given by simultaneous or alternating gradient descent, which causes each player to drift away from the continuous gradient flow -- a phenomenon we call discretization drift. Using b… ▽ More

    Submitted 1 July, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

  11. arXiv:2101.12176  [pdf, other

    cs.LG stat.ML

    On the Origin of Implicit Regularization in Stochastic Gradient Descent

    Authors: Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

    Abstract: For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization benefit is not explained by convergence bounds, since the learning rate which maximizes test accuracy is often larger than the learning rate which minimizes training… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

    Comments: Accepted as a conference paper at ICLR 2021

  12. arXiv:2009.11162  [pdf, other

    cs.LG stat.ML

    Implicit Gradient Regularization

    Authors: David G. T. Barrett, Benoit Dherin

    Abstract: Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size… ▽ More

    Submitted 18 July, 2022; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: Correction to formula A.14 in Appendix A.1 and update to the acknowledgments

    Journal ref: Published as a conference paper at ICLR 2021