Skip to main content

Showing 1–26 of 26 results for author: Bottou, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2306.00802  [pdf, other

    stat.ML cs.CL cs.LG

    Birth of a Transformer: A Memory Viewpoint

    Authors: Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

    Abstract: Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We stu… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  2. arXiv:2204.03632  [pdf, other

    cs.LG cs.CV stat.ML

    The Effects of Regularization and Data Augmentation are Class Dependent

    Authors: Randall Balestriero, Leon Bottou, Yann LeCun

    Abstract: Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  3. arXiv:2106.09467  [pdf, other

    cs.LG stat.ML

    Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

    Authors: Agnieszka Słowik, Léon Bottou

    Abstract: Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigati… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  4. arXiv:2003.02395  [pdf, other

    stat.ML cs.LG

    A Simple Convergence Proof of Adam and Adagrad

    Authors: Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

    Abstract: We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer, th… ▽ More

    Submitted 17 October, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: final TMLR version

  5. arXiv:1911.13254  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Music Source Separation in the Waveform Domain

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source… ▽ More

    Submitted 28 April, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

  6. arXiv:1909.13334  [pdf, other

    cs.LG stat.ML

    Symplectic Recurrent Neural Networks

    Authors: Zhengdao Chen, Jianyu Zhang, Martin Arjovsky, Léon Bottou

    Abstract: We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. An SRNN models the Hamiltonian function of the system by a neural network and furthermore leverages symplectic integration, multiple-step training and initial state optimization to address the challenging numerical issues associated with Hamiltoni… ▽ More

    Submitted 25 April, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

    Comments: Added link to GitHub repository

    Journal ref: 8th International Conference on Learning Representations (ICLR 2020)

  7. arXiv:1909.01174  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurren… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  8. arXiv:1907.02893  [pdf, other

    stat.ML cs.AI cs.LG

    Invariant Risk Minimization

    Authors: Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz

    Abstract: We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the cau… ▽ More

    Submitted 27 March, 2020; v1 submitted 5 July, 2019; originally announced July 2019.

  9. Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks

    Authors: Aaron Defazio, Léon Bottou

    Abstract: We propose a system for calculating a "scaling constant" for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is "preconditioned" via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a n… ▽ More

    Submitted 11 February, 2021; v1 submitted 10 June, 2019; originally announced June 2019.

    Journal ref: Neural Comput & Applic (2022)

  10. arXiv:1905.10498  [pdf, other

    cs.LG cs.CV stat.ML

    Cold Case: The Lost MNIST Digits

    Authors: Chhavi Yadav, Léon Bottou

    Abstract: Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata… ▽ More

    Submitted 4 November, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Final NeurIPS version

  11. arXiv:1812.04549  [pdf, other

    cs.LG stat.ML

    Controlling Covariate Shift using Balanced Normalization of Weights

    Authors: Aaron Defazio, Léon Bottou

    Abstract: We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs. The proposed technique keeps the contribution of positive and negative weights to the layer output balanced. We validate our method on a set of standard benchmarks including CIFAR-10/100, SVHN and ILSVRC 2012 ImageNet.

    Submitted 10 May, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

  12. arXiv:1812.04529  [pdf, other

    cs.LG stat.ML

    On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

    Authors: Aaron Defazio, Léon Bottou

    Abstract: The application of stochastic variance reduction to optimization has shown remarkable recent theoretical and practical success. The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem. We show that naive application of the SVRG technique and related approaches fail, and explore why.

    Submitted 20 November, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

  13. arXiv:1810.09785  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    SING: Symbol-to-Instrument Neural Generator

    Authors: Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Journal ref: Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada

  14. arXiv:1806.01811  [pdf, ps, other

    stat.ML cs.LG

    AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

    Authors: Rachel Ward, Xiaoxia Wu, Leon Bottou

    Abstract: Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online… ▽ More

    Submitted 18 April, 2021; v1 submitted 5 June, 2018; originally announced June 2018.

    Journal ref: journal = {Journal of Machine Learning Research}, year = {2020}, volume = {21}, number = {219}, pages = {1-30}, url = {http://jmlr.org/papers/v21/18-352.html}

  15. arXiv:1803.02865  [pdf, ps, other

    stat.ML cs.AI cs.LG math.NA math.OC

    WNGrad: Learn the Learning Rate in Gradient Descent

    Authors: Xiaoxia Wu, Rachel Ward, Léon Bottou

    Abstract: Adjusting the learning rate schedule in stochastic gradient methods is an important unresolved problem which requires tuning in practice. If certain parameters of the loss function such as smoothness or strong convexity constants are known, theoretical learning rate schedules can be applied. However, in practice, such parameters are not known, and the loss function of interest is not convex in any… ▽ More

    Submitted 19 November, 2020; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 10 pages, 3 figures, conference

    MSC Class: 80M50; 90C15; 90C26; 90C30; 68T05

  16. arXiv:1802.01421  [pdf, other

    stat.ML cs.CV cs.LG

    First-order Adversarial Vulnerability of Neural Networks and Input Dimension

    Authors: Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz

    Abstract: Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: for many standard netwo… ▽ More

    Submitted 16 June, 2019; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: Paper previously called: "Adversarial Vulnerability of Neural Networks Increases with Input Dimension". 9 pages main text and references, 11 pages appendix, 14 figures

    MSC Class: 68T45 ACM Class: I.2.6

    Journal ref: Proceedings of ICML 2019

  17. arXiv:1712.07822  [pdf, other

    stat.ML cs.AI cs.LG

    Geometrical Insights for Implicit Generative Modeling

    Authors: Leon Bottou, Martin Arjovsky, David Lopez-Paz, Maxime Oquab

    Abstract: Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences… ▽ More

    Submitted 21 August, 2019; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: this version fixes a typo in a definition

  18. arXiv:1705.09319  [pdf, other

    cs.LG stat.ML

    Diagonal Rescaling For Neural Networks

    Authors: Jean Lafond, Nicolas Vasilache, Léon Bottou

    Abstract: We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tric… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

  19. arXiv:1701.07875  [pdf, other

    stat.ML cs.LG

    Wasserstein GAN

    Authors: Martin Arjovsky, Soumith Chintala, Léon Bottou

    Abstract: We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical wor… ▽ More

    Submitted 6 December, 2017; v1 submitted 26 January, 2017; originally announced January 2017.

  20. arXiv:1701.04862  [pdf, other

    stat.ML cs.LG

    Towards Principled Methods for Training Generative Adversarial Networks

    Authors: Martin Arjovsky, Léon Bottou

    Abstract: The goal of this paper is not to introduce a single algorithm or method, but to make theoretical steps towards fully understanding the training dynamics of generative adversarial networks. In order to substantiate our theoretical analysis, we perform targeted experiments to verify our assumptions, illustrate our claims, and quantify the phenomena. This paper is divided into three sections. The fir… ▽ More

    Submitted 17 January, 2017; originally announced January 2017.

  21. arXiv:1606.04838  [pdf, other

    stat.ML cs.LG math.OC

    Optimization Methods for Large-Scale Machine Learning

    Authors: Léon Bottou, Frank E. Curtis, Jorge Nocedal

    Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine… ▽ More

    Submitted 8 February, 2018; v1 submitted 15 June, 2016; originally announced June 2016.

  22. arXiv:1605.08179  [pdf, other

    stat.ML cs.CV

    Discovering Causal Signals in Images

    Authors: David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Schölkopf, Léon Bottou

    Abstract: This paper establishes the existence of observable footprints that reveal the "causal dispositions" of the object categories appearing in collections of images. We achieve this goal in two steps. First, we take a learning approach to observational causal discovery, and build a classifier that achieves state-of-the-art performance on finding the causal direction between pairs of random variables, g… ▽ More

    Submitted 31 October, 2017; v1 submitted 26 May, 2016; originally announced May 2016.

  23. arXiv:1511.03643  [pdf, other

    stat.ML cs.LG

    Unifying distillation and privileged information

    Authors: David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik

    Abstract: Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, exten… ▽ More

    Submitted 25 February, 2016; v1 submitted 11 November, 2015; originally announced November 2015.

    Journal ref: Proceedings of the International Conference on Learning Representations (2016) 1-10

  24. arXiv:1508.02933  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    No Regret Bound for Extreme Bandits

    Authors: Robert Nishihara, David Lopez-Paz, Léon Bottou

    Abstract: Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with s… ▽ More

    Submitted 11 April, 2016; v1 submitted 12 August, 2015; originally announced August 2015.

    Comments: 11 pages, International Conference on Artificial Intelligence and Statistics, 2016

  25. arXiv:1410.0723  [pdf, ps, other

    stat.ML math.OC

    A Lower Bound for the Optimization of Finite Sums

    Authors: Alekh Agarwal, Leon Bottou

    Abstract: This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $μ$-strongly convex. We show that no algorithm can reach an error $ε$ in minimizing all functions from this class in fewer than $Ω(n + \sqrt{n(κ-1)}\log(1/ε))$ iterations, where $κ=L/μ$ is a surrogate condition number. We then compare this lower bound to upper bounds for… ▽ More

    Submitted 3 October, 2015; v1 submitted 2 October, 2014; originally announced October 2014.

    Comments: Added an erratum, we are currently working on extending the result to randomized algorithms

  26. arXiv:1310.8243  [pdf, other

    cs.LG stat.ML

    Para-active learning

    Authors: Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

    Abstract: Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show tha… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.