Skip to main content

Showing 1–23 of 23 results for author: Hoffer, E

.
  1. arXiv:2503.19693  [pdf, other

    cs.CL

    AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

    Authors: Itay Nakash, Nitay Calderon, Eyal Ben David, Elad Hoffer, Roi Reichart

    Abstract: Large Language Models (LLMs) have shown impressive versatility as general purpose models. However, their broad applicability comes at a high-cost computational overhead, particularly in auto-regressive decoding where each step requires a forward pass. In domain-specific settings, general-purpose capabilities are unnecessary and can be exchanged for efficiency. In this work, we take a novel perspec… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  2. arXiv:2306.10598  [pdf, other

    cs.LG

    DropCompute: simple and more robust distributed synchronous training via compute variance reduction

    Authors: Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry

    Abstract: Background: Distributed training is essential for large scale training of deep neural networks (DNNs). The dominant methods for large scale DNN training are synchronous (e.g. All-Reduce), but these require waiting for all workers in each step. Thus, these methods are limited by the delays caused by straggling workers. Results: We study a typical scenario in which workers are straggling due to vari… ▽ More

    Submitted 24 September, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: https://github.com/paper-submissions/dropcompute

    Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2202.02783  [pdf, other

    cs.LG cs.CV

    Energy awareness in low precision neural networks

    Authors: Nurit Spingarn Eliezer, Ron Banner, Elad Hoffer, Hilla Ben-Yaakov, Tomer Michaeli

    Abstract: Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. Existing approaches for reducing power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not take into account the precise power consumed by each module in the network, a… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

  4. arXiv:2112.10769  [pdf, other

    cs.LG

    Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

    Authors: Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry

    Abstract: Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e… ▽ More

    Submitted 9 June, 2024; v1 submitted 19 December, 2021; originally announced December 2021.

  5. Task Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates

    Authors: Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry

    Abstract: Background: Catastrophic forgetting is the notorious vulnerability of neural networks to the changes in the data distribution during learning. This phenomenon has long been considered a major obstacle for using learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, only a few works consi… ▽ More

    Submitted 18 October, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: The arXiv paper "Task Agnostic Continual Learning Using Online Variational Bayes" is a preliminary pre-print of this paper. The main differences between the versions are: 1. We develop new algorithmic framework (FOO-VB). 2. We add multivariate Gaussian and matrix variate Gaussian versions of the algorithm. 3. We demonstrate the new algorithm performance in task agnostic scenarios

    Journal ref: Neural Comput 2021; 33 (11)

  6. arXiv:2006.08173  [pdf, other

    cs.CV cs.LG

    Neural gradients are near-lognormal: improved quantized and sparse training

    Authors: Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry

    Abstract: While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neura… ▽ More

    Submitted 12 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  7. arXiv:1912.01274  [pdf, other

    cs.LG cs.CV stat.ML

    The Knowledge Within: Methods for Data-Free Model Compression

    Authors: Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry

    Abstract: Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a fine-tuning process. However, this requirement is unacceptable when the data is unavailable or contains sensitive information, as in medical and biometric use-cases.… ▽ More

    Submitted 6 April, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

  8. arXiv:1909.12340  [pdf, other

    cs.LG stat.ML

    At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

    Authors: Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

    Abstract: Background: Recent developments have made it possible to accelerate neural networks training significantly using large batch sizes and data parallelism. Training in an asynchronous fashion, where delay occurs, can make training even more scalable. However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm. This gap remains not w… ▽ More

    Submitted 13 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: ICLR 2020 Camera ready version

  9. arXiv:1908.08986  [pdf, other

    cs.CV cs.LG stat.ML

    Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency

    Authors: Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry

    Abstract: Convolutional neural networks (CNNs) are commonly trained using a fixed spatial image size predetermined for a given model. Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps. In this work, we describe and evaluate a novel mixed-size training regime that… ▽ More

    Submitted 12 August, 2019; originally announced August 2019.

  10. arXiv:1901.09335  [pdf, other

    cs.LG stat.ML

    Augment your batch: better training with larger batches

    Authors: Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry

    Abstract: Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization a… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

  11. arXiv:1810.05723  [pdf, other

    cs.CV

    Post-training 4-bit quantization of convolution networks for rapid-deployment

    Authors: Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

    Abstract: Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the… ▽ More

    Submitted 29 May, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

  12. arXiv:1805.11046  [pdf, other

    cs.LG stat.ML

    Scalable Methods for 8-bit Training of Neural Networks

    Authors: Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry

    Abstract: Quantized Neural Networks (QNNs) are often used to improve network efficiency during the inference phase, i.e. after the network has been trained. Extensive research in the field suggests many different quantization schemes. Still, the number of bits required, as well as the best quantization scheme, are yet unknown. Our theoretical analysis suggests that most of the training process is robust to… ▽ More

    Submitted 17 June, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

  13. arXiv:1803.10123  [pdf, other

    stat.ML cs.LG

    Task Agnostic Continual Learning Using Online Variational Bayes

    Authors: Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry

    Abstract: Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, research for scenarios… ▽ More

    Submitted 12 February, 2019; v1 submitted 27 March, 2018; originally announced March 2018.

  14. arXiv:1803.01814  [pdf, other

    stat.ML cs.LG

    Norm matters: efficient and accurate normalization schemes in deep networks

    Authors: Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry

    Abstract: Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain tasks. In this work, we present a novel view on the purpose and function of normalization methods and weight-dec… ▽ More

    Submitted 7 February, 2019; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: http://papers.nips.cc/paper/7485-norm-matters-efficient-and-accurate-normalization-schemes-in-deep-networks

    Journal ref: NeurIPS2018

  15. arXiv:1802.05187  [pdf, other

    stat.ML cs.LG

    On the Blindspots of Convolutional Networks

    Authors: Elad Hoffer, Shai Fine, Daniel Soudry

    Abstract: Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years. Its successes have, in many cases, turned it into the default model in quite a few domains. In this work, we will demonstrate that convolutional networks have limitations that may, in some cases, hinder it from learning properties of the data, which are easily recognizable by trad… ▽ More

    Submitted 8 July, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

  16. arXiv:1801.04540  [pdf, other

    cs.LG cs.CV stat.ML

    Fix your classifier: the marginal value of training the last weight layer

    Authors: Elad Hoffer, Itay Hubara, Daniel Soudry

    Abstract: Neural networks are commonly used as models for classification for a wide variety of tasks. Typically, a learned affine transformation is placed at the end of such models, yielding a per-class value used for classification. This classifier can have a vast number of parameters, which grows linearly with the number of possible classes, thus requiring increasingly more resources. In this work we argu… ▽ More

    Submitted 20 March, 2018; v1 submitted 14 January, 2018; originally announced January 2018.

    Comments: https://openreview.net/forum?id=S1Dh8Tg0-

    Journal ref: International Conference on Learning Representations 2018

  17. arXiv:1710.10345  [pdf, ps, other

    stat.ML cs.LG

    The Implicit Bias of Gradient Descent on Separable Data

    Authors: Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

    Abstract: We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a d… ▽ More

    Submitted 26 October, 2024; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: Added a missing assumption to Theorem 7 (multi-class case) and a discussion of this assumption after the Theorem

  18. arXiv:1705.08741  [pdf, other

    stat.ML cs.LG

    Train longer, generalize better: closing the generalization gap in large batch training of neural networks

    Authors: Elad Hoffer, Itay Hubara, Daniel Soudry

    Abstract: Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the "generalization gap" phenomena. Identifying… ▽ More

    Submitted 1 January, 2018; v1 submitted 24 May, 2017; originally announced May 2017.

    Journal ref: Advances in Neural Information Processing Systems 30 2017; pages 1729-1739; http://papers.nips.cc/paper/6770-train-longer-generalize-better-closing-the-generalization-gap-in-large-batch-training-of-neural-networks

  19. arXiv:1702.05777  [pdf, ps, other

    stat.ML

    Exponentially vanishing sub-optimal local minima in multilayer neural networks

    Authors: Daniel Soudry, Elad Hoffer

    Abstract: Background: Statistical mechanics results (Dauphin et al. (2014); Choromanska et al. (2015)) suggest that local minima with high error are exponentially rare in high dimensions. However, to prove low error guarantees for Multilayer Neural Networks (MNNs), previous works so far required either a heavily modified MNN model or training method, strong assumptions on the labels (e.g., "near" linear sep… ▽ More

    Submitted 28 October, 2017; v1 submitted 19 February, 2017; originally announced February 2017.

  20. arXiv:1611.06996  [pdf, ps, other

    stat.ML cs.LG

    Spatial contrasting for deep unsupervised learning

    Authors: Elad Hoffer, Itay Hubara, Nir Ailon

    Abstract: Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training me… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  21. arXiv:1611.01449  [pdf, other

    cs.LG

    Semi-supervised deep learning by metric embedding

    Authors: Elad Hoffer, Nir Ailon

    Abstract: Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of their tendency to overfit easily when trained on small amounts of data. In this work we will explore a new training objective that is targeting a semi-supervised… ▽ More

    Submitted 4 December, 2018; v1 submitted 4 November, 2016; originally announced November 2016.

  22. arXiv:1610.00243  [pdf, other

    cs.LG cs.AI stat.ML

    Deep unsupervised learning through spatial contrasting

    Authors: Elad Hoffer, Itay Hubara, Nir Ailon

    Abstract: Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training me… ▽ More

    Submitted 4 December, 2018; v1 submitted 2 October, 2016; originally announced October 2016.

  23. arXiv:1412.6622  [pdf, other

    cs.LG cs.CV stat.ML

    Deep metric learning using Triplet network

    Authors: Elad Hoffer, Nir Ailon

    Abstract: Deep learning has proven itself as a successful set of models for learning useful semantic representations of data. These, however, are mostly implicitly learned as part of a classification task. In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons. A similar model was defined by Wang et al. (2014), tailor made for learning a rankin… ▽ More

    Submitted 4 December, 2018; v1 submitted 20 December, 2014; originally announced December 2014.