Skip to main content

Showing 1–7 of 7 results for author: Freeman, C D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2211.09760  [pdf, other

    cs.LG math.OC stat.ML

    VeLO: Training Versatile Learned Optimizers by Scaling Up

    Authors: Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein

    Abstract: While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. M… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  2. arXiv:2203.11860  [pdf, other

    cs.LG cs.NE math.OC stat.ML

    Practical tradeoffs between memory, compute, and performance in learned optimizers

    Authors: Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan, Jascha Sohl-Dickstein

    Abstract: Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned opti… ▽ More

    Submitted 16 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  3. arXiv:2111.05803  [pdf, other

    cs.LG stat.ML

    Gradients are Not All You Need

    Authors: Luke Metz, C. Daniel Freeman, Samuel S. Schoenholz, Tal Kachman

    Abstract: Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics sim… ▽ More

    Submitted 20 January, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

  4. arXiv:2009.11243  [pdf, other

    cs.LG cs.NE stat.ML

    Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

    Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

    Abstract: Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  5. arXiv:2002.11887  [pdf, other

    cs.LG stat.ML

    Using a thousand optimization tasks to learn hyperparameter search strategies

    Authors: Luke Metz, Niru Maheswaranathan, Ruoxi Sun, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

    Abstract: We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-l… ▽ More

    Submitted 31 March, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  6. arXiv:1810.10180  [pdf, other

    cs.NE stat.ML

    Understanding and correcting pathologies in the training of learned optimizers

    Authors: Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

    Abstract: Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks. Analogously, this suggests that learned optimizers may similarly outperform current hand-designed optimizers, especially for specific problems. However, learned optimizers are notoriously difficult to train and have yet to demonstrate wall-clock speedups over hand-designed optimi… ▽ More

    Submitted 7 June, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

  7. arXiv:1611.01540  [pdf, other

    stat.ML cs.LG

    Topology and Geometry of Half-Rectified Network Optimization

    Authors: C. Daniel Freeman, Joan Bruna

    Abstract: The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assumpt… ▽ More

    Submitted 1 June, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

    Comments: 22 Pages (10 main + Appendices), 4 Figures, 1 Table, Published as a conference paper at ICLR 2017