Automatic Gradient Descent: Deep Learning without Hyperparameters

Bernstein, Jeremy; Mingard, Chris; Huang, Kevin; Azizan, Navid; Yue, Yisong

Computer Science > Machine Learning

arXiv:2304.05187 (cs)

[Submitted on 11 Apr 2023]

Title:Automatic Gradient Descent: Deep Learning without Hyperparameters

Authors:Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue

View PDF

Abstract:The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural information (e.g. second-order methods) or architecture-agnostic distance functions (e.g. mirror descent). Meanwhile, the most popular optimiser in practice, Adam, is based on heuristics. This paper builds a new framework for deriving optimisation algorithms that explicitly leverage neural architecture. The theory extends mirror descent to non-convex composite objective functions: the idea is to transform a Bregman divergence to account for the non-linear structure of neural architecture. Working through the details for deep fully-connected networks yields automatic gradient descent: a first-order optimiser without any hyperparameters. Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale. A PyTorch implementation is available at this https URL and also in Appendix B. Overall, the paper supplies a rigorous theoretical foundation for a next-generation of architecture-dependent optimisers that work automatically and without hyperparameters.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Cite as:	arXiv:2304.05187 [cs.LG]
	(or arXiv:2304.05187v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2304.05187

Submission history

From: Chris Mingard [view email]
[v1] Tue, 11 Apr 2023 12:45:52 UTC (4,187 KB)

Computer Science > Machine Learning

Title:Automatic Gradient Descent: Deep Learning without Hyperparameters

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Automatic Gradient Descent: Deep Learning without Hyperparameters

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators