Skip to main content

Showing 1–8 of 8 results for author: Dehghani, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2110.12894  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    The Efficiency Misnomer

    Authors: Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

    Abstract: Model efficiency is a critical aspect of developing and deploying machine learning models. Inference time and latency directly affect the user experience, and some applications have hard requirements. In addition to inference costs, model training also have direct financial and environmental impacts. Although there are numerous well-established metrics (cost indicators) for measuring model efficie… ▽ More

    Submitted 16 March, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

  2. arXiv:2110.02095  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Exploring the Limits of Large Scale Pre-training

    Authors: Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

    Abstract: Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular,… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  3. arXiv:2006.12459  [pdf, other

    cs.LG stat.ML

    IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression

    Authors: Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

    Abstract: In this paper we analyse and improve integer discrete flows for lossless compression. Integer discrete flows are a recently proposed class of models that learn invertible transformations for integer-valued random variables. Their discrete nature makes them particularly suitable for lossless compression with entropy coding schemes. We start by investigating a recent theoretical claim that states th… ▽ More

    Submitted 23 March, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted as a conference paper at the Ninth International Conference on Learning Representations (ICLR) 2021

  4. arXiv:2006.00555  [pdf, other

    cs.LG cs.AI stat.ML

    Transferring Inductive Biases through Knowledge Distillation

    Authors: Samira Abnar, Mostafa Dehghani, Willem Zuidema

    Abstract: Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time. However, defining, designing and efficiently adapting inductive biases is not necessarily straightforward. In this paper, we explore the power of knowledge distillation for transfe… ▽ More

    Submitted 4 October, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

  5. arXiv:2003.12140  [pdf, other

    cs.LG physics.ao-ph stat.ML

    MetNet: A Neural Weather Model for Precipitation Forecasting

    Authors: Casper Kaae Sønderby, Lasse Espeholt, Jonathan Heek, Mostafa Dehghani, Avital Oliver, Tim Salimans, Shreya Agrawal, Jason Hickey, Nal Kalchbrenner

    Abstract: Weather forecasting is a long standing scientific challenge with direct social and economic impact. The task is suitable for deep neural networks due to vast amounts of continuously collected data and a rich spatial and temporal structure that presents long range dependencies. We introduce MetNet, a neural network that forecasts precipitation up to 8 hours into the future at the high spatial resol… ▽ More

    Submitted 30 March, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

  6. arXiv:1807.03819  [pdf, other

    cs.CL cs.LG stat.ML

    Universal Transformers

    Authors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

    Abstract: Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train. Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine tr… ▽ More

    Submitted 5 March, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: Published at ICLR2019

  7. arXiv:1711.11383  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Learning to Learn from Weak Supervision by Full Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that ar… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

    Comments: Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

  8. arXiv:1711.00313  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

    Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

    Abstract: Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-t… ▽ More

    Submitted 7 December, 2017; v1 submitted 1 November, 2017; originally announced November 2017.