Skip to main content

Showing 1–10 of 10 results for author: Gomez, A N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2209.13569  [pdf, other

    cs.LG stat.ML

    Exploring Low Rank Training of Deep Neural Networks

    Authors: Siddhartha Rao Kamalakara, Acyr Locatelli, Bharat Venkitesh, Jimmy Ba, Yarin Gal, Aidan N. Gomez

    Abstract: Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  2. arXiv:2106.05833  [pdf, other

    cs.DS stat.ME

    Incremental space-filling design based on coverings and spacings: improving upon low discrepancy sequences

    Authors: Amaya Nogales Gómez, Luc Pronzato, Maria-João Rendas

    Abstract: The paper addresses the problem of defining families of ordered sequences $\{x_i\}_{i\in N}$ of elements of a compact subset $X$ of $R^d$ whose prefixes $X_n=\{x_i\}_{i=1}^{n}$, for all orders $n$, have good space-filling properties as measured by the dispersion (covering radius) criterion. Our ultimate aim is the definition of incremental algorithms that generate sequences $X_n$ with small optima… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: 28 pages, 13 figures

    MSC Class: 62K99; 65D99

  3. arXiv:2106.02584  [pdf, other

    cs.LG stat.ML

    Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

    Authors: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal

    Abstract: We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships betw… ▽ More

    Submitted 1 February, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at NeurIPS 2021. First two authors contributed equally

  4. arXiv:2103.06002  [pdf, other

    cs.LG stat.ML

    Robustness to Pruning Predicts Generalization in Deep Neural Networks

    Authors: Lorenz Kuhn, Clare Lyle, Aidan N. Gomez, Jonas Rothfuss, Yarin Gal

    Abstract: Existing generalization measures that aim to capture a model's simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically motivated measure of a network's simplicity which we call prunability: the smallest \emph{fraction} of the network's parameters that can be kept while pruning without a… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

  5. arXiv:2007.10909  [pdf, other

    cs.LG stat.ML

    Improving compute efficacy frontiers with SliceOut

    Authors: Pascal Notin, Aidan N. Gomez, Joanna Yoo, Yarin Gal

    Abstract: Pushing forward the compute efficacy frontier in deep learning is critical for tasks that require frequent model re-training or workloads that entail training a large number of models. We introduce SliceOut -- a dropout-inspired scheme designed to take advantage of GPU memory layout to train deep learning models faster without impacting final test accuracy. By dropping contiguous sets of units at… ▽ More

    Submitted 31 March, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

  6. arXiv:2006.08344  [pdf, other

    cs.CL cs.LG stat.ML

    Wat zei je? Detecting Out-of-Distribution Translations with Variational Transformers

    Authors: Tim Z. Xiao, Aidan N. Gomez, Yarin Gal

    Abstract: We detect out-of-training-distribution sentences in Neural Machine Translation using the Bayesian Deep Learning equivalent of Transformer models. For this we develop a new measure of uncertainty designed specifically for long sequences of discrete random variables -- i.e. words in the output sentence. Our new measure of uncertainty solves a major intractability in the naive application of existing… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 19 pages, 9 figures

  7. arXiv:1912.10481  [pdf, other

    stat.ML cs.LG eess.IV

    A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

    Authors: Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal

    Abstract: Evaluation of Bayesian deep learning (BDL) methods is challenging. We often seek to evaluate the methods' robustness and scalability, assessing whether new tools give `better' uncertainty estimates than old ones. These evaluations are paramount for practitioners when choosing BDL tools on-top of which they build their applications. Current popular evaluations of BDL methods, such as the UCI experi… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  8. arXiv:1905.13678  [pdf, other

    cs.LG stat.ML

    Learning Sparse Networks Using Targeted Dropout

    Authors: Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

    Abstract: Neural networks are easier to optimise when they have many more weights than are required for modelling the mapping from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for traini… ▽ More

    Submitted 9 September, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

  9. arXiv:1803.07416  [pdf, other

    cs.LG cs.CL stat.ML

    Tensor2Tensor for Neural Machine Translation

    Authors: Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

    Abstract: Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762

  10. arXiv:1706.05137  [pdf, other

    cs.LG stat.ML

    One Model To Learn Them All

    Authors: Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

    Abstract: Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrentl… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.