Skip to main content

Showing 1–7 of 7 results for author: Balles, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2207.06940  [pdf, other

    cs.LG stat.ML

    PASHA: Efficient HPO and NAS with Progressive Resource Allocation

    Authors: Ondrej Bohdal, Lukas Balles, Martin Wistuba, Beyza Ermis, Cédric Archambeau, Giovanni Zappella

    Abstract: Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach t… ▽ More

    Submitted 8 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted at ICLR 2023

  2. arXiv:2002.08056  [pdf, other

    cs.LG stat.ML

    The Geometry of Sign Gradient Descent

    Authors: Lukas Balles, Fabian Pedregosa, Nicolas Le Roux

    Abstract: Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected to so-called adaptive gradient methods like Adam. Recent works on signSGD have used a non-standard "separable smoothness" assumption, whereas some old… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

  3. arXiv:1905.12558  [pdf, other

    cs.LG stat.ML

    Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

    Authors: Frederik Kunstner, Lukas Balles, Philipp Hennig

    Abstract: Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argumen… ▽ More

    Submitted 8 June, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: V3: Minor corrections (typographic errors)

  4. arXiv:1903.05499  [pdf, other

    cs.LG stat.ML

    DeepOBS: A Deep Learning Optimizer Benchmark Suite

    Authors: Frank Schneider, Lukas Balles, Philipp Hennig

    Abstract: Because the choice and tuning of the optimizer affects the speed, and ultimately the performance of deep learning, there is significant past and recent research in this area. Yet, perhaps surprisingly, there is no generally agreed-upon protocol for the quantitative and reproducible evaluation of optimization strategies for deep learning. We suggest routines and benchmarks for stochastic optimizati… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted at ICLR 2019. 9 pages, 3 figures, 2 tables

  5. arXiv:1705.07774  [pdf, other

    cs.LG stat.ML

    Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

    Authors: Lukas Balles, Philipp Hennig

    Abstract: The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze th… ▽ More

    Submitted 13 December, 2020; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Presented at the 35th International Conference on Machine Learning (ICML), 2018

  6. arXiv:1703.09580  [pdf, other

    cs.LG stat.ML

    Early Stopping without a Validation Set

    Authors: Maren Mahsereci, Lukas Balles, Christoph Lassner, Philipp Hennig

    Abstract: Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. We propose a novel early stopping crite… ▽ More

    Submitted 6 June, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: 16 pages, 10 figures

  7. arXiv:1612.05086  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Coupling Adaptive Batch Sizes with Learning Rates

    Authors: Lukas Balles, Javier Romero, Philipp Hennig

    Abstract: Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of t… ▽ More

    Submitted 28 June, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

    Comments: Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI), 2017, (accepted)