Skip to main content

Showing 1–4 of 4 results for author: Shallue, C J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2102.06356  [pdf, other

    cs.LG stat.ML

    A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

    Authors: Zachary Nado, Justin M. Gilmer, Christopher J. Shallue, Rohan Anil, George E. Dahl

    Abstract: Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether L… ▽ More

    Submitted 9 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  2. arXiv:1910.05446  [pdf, other

    cs.LG stat.ML

    On Empirical Comparisons of Optimizers for Deep Learning

    Authors: Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl

    Abstract: Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper, we demonstrate the sensitivity of optimizer comparisons to the hyperparameter tuning protocol. Our findings suggest that the hyperparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that t… ▽ More

    Submitted 15 June, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

  3. arXiv:1907.04164  [pdf, other

    cs.LG stat.ML

    Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

    Authors: Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

    Abstract: Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noi… ▽ More

    Submitted 28 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: NeurIPS 2019

  4. arXiv:1811.03600  [pdf, other

    cs.LG stat.ML

    Measuring the Effects of Data Parallelism on Neural Network Training

    Authors: Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl

    Abstract: Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-49