Skip to main content

Showing 1–50 of 58 results for author: Rastegari, M

.
  1. arXiv:2410.10714  [pdf, ps, other

    cs.LG cs.AI

    SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

    Authors: Rasoul Shafipour, David Harrison, Maxwell Horton, Jeffrey Marker, Houman Bedayat, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi, Saman Naderiparizi

    Abstract: Large Language Models (LLMs) have transformed natural language processing, but face significant challenges in widespread deployment due to their high runtime cost. In this paper, we introduce SeedLM, a novel post-training compression method that uses seeds of pseudo-random generators to encode and compress model weights. Specifically, for each block of weights, we find a seed that is fed into a Li… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

  2. arXiv:2410.08391  [pdf, other

    cs.CL cs.AI

    KV Prediction for Improved Time to First Token

    Authors: Maxwell Horton, Qingqing Cao, Chenfan Sun, Yanzi Jin, Sachin Mehta, Mohammad Rastegari, Moin Nabi

    Abstract: Inference with transformer-based language models begins with a prompt processing step. In this step, the model generates the first output token and stores the KV cache needed for future generation steps. This prompt processing step can be computationally expensive, taking 10s of seconds or more for billion-parameter models on edge devices when prompt lengths or batch sizes rise. This degrades user… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  3. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  4. arXiv:2407.14057  [pdf, other

    cs.CL cs.AI cs.LG

    LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

    Authors: Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi

    Abstract: The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first tok… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  5. arXiv:2405.05329  [pdf, other

    cs.DC cs.AI cs.CL

    KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

    Authors: Minsik Cho, Mohammad Rastegari, Devang Naik

    Abstract: Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-va… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: preprint for ICML 2024

  6. arXiv:2404.15653  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

    Authors: Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari

    Abstract: Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed m… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  7. arXiv:2404.14619  [pdf, other

    cs.CL cs.AI cs.LG

    OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

    Authors: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

    Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Minor corrections

  8. arXiv:2404.06910  [pdf, other

    cs.CL cs.AI cs.LG

    Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

    Authors: Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi

    Abstract: Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon"… ▽ More

    Submitted 19 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  9. arXiv:2402.11131  [pdf, other

    cs.CL cs.AI cs.LG

    Speculative Streaming: Fast LLM Inference without Auxiliary Models

    Authors: Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Mohammad Rastegari, Mahyar Najibi

    Abstract: Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference s… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  10. arXiv:2312.11514  [pdf, other

    cs.CL cs.AI cs.LG

    LLM in a flash: Efficient Large Language Model Inference with Limited Memory

    Authors: Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar

    Abstract: Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameter… ▽ More

    Submitted 30 July, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  11. arXiv:2312.09299  [pdf, other

    cs.LG cs.CL cs.CV

    Weight subcloning: direct initialization of transformers using larger pretrained ones

    Authors: Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari

    Abstract: Training large transformer models from scratch for a target task requires lots of data and is computationally demanding. The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed. However, what if no pretrained model of the required size is available… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  12. arXiv:2311.18237  [pdf, other

    cs.CV cs.LG

    Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

    Authors: Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

    Abstract: Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: International Conference on Machine Learning, 2024

  13. arXiv:2310.15308  [pdf, other

    cs.CV cs.LG

    SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

    Authors: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari

    Abstract: The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficient… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  14. arXiv:2310.14108  [pdf, other

    cs.LG cs.AI cs.CV

    CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

    Authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

    Abstract: Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual represent… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  15. arXiv:2310.04564  [pdf, other

    cs.LG cs.AI

    ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

    Authors: Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar

    Abstract: Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstat… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: preprint

  16. arXiv:2310.03937  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Diffusion Models as Masked Audio-Video Learners

    Authors: Elvis Nunez, Yanzi Jin, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

    Abstract: Over the past several years, the synchronization between audio and visual signals has been leveraged to learn richer audio-visual representations. Aided by the large availability of unlabeled videos, many unsupervised training frameworks have demonstrated impressive results in various downstream audio and video tasks. Recently, Masked Audio-Video Learners (MAViL) has emerged as a state-of-the-art… ▽ More

    Submitted 4 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Camera-ready version for the Machine Learning for Audio Workshop at NeurIPS 2023

  17. arXiv:2310.00867  [pdf, other

    cs.CL cs.AI

    Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

    Authors: Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang

    Abstract: Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  18. arXiv:2309.04502  [pdf, other

    cs.CV

    On the Efficacy of Multi-scale Data Samplers for Vision Applications

    Authors: Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

    Abstract: Multi-scale resolution training has seen an increased adoption across multiple vision tasks, including classification and detection. Training with smaller resolutions enables faster training at the expense of a drop in accuracy. Conversely, training with larger resolutions has been shown to improve performance, but memory constraints often make this infeasible. In this paper, we empirically study… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  19. arXiv:2309.00964  [pdf, other

    cs.LG cs.AI

    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

    Authors: Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

    Abstract: Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, wei… ▽ More

    Submitted 13 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: preprint

  20. arXiv:2306.00238  [pdf, other

    cs.CV

    Bytes Are All You Need: Transformers Operating Directly On File Bytes

    Authors: Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari

    Abstract: Modern deep learning approaches usually utilize modality-specific processing. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate modality-independent representation learning by performing classification directly on file bytes, without the need for decoding f… ▽ More

    Submitted 1 July, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Journal ref: Transactions on Machine Learning Research 2835-8856 (2024)

  21. arXiv:2303.08983  [pdf, other

    cs.CV cs.AI cs.LG

    Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

    Authors: Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel

    Abstract: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-base… ▽ More

    Submitted 22 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted at International Conference on Computer Vision (ICCV) 2023. v2: Camera-ready version with new Tables 9 and 10. v3: Correction to Table 7-Avg. column

  22. arXiv:2212.10553  [pdf, other

    cs.CV cs.AI cs.LG

    RangeAugment: Efficient Online Augmentation with Range Learning

    Authors: Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari

    Abstract: State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Technical report (22 pages including references and appendix)

  23. arXiv:2207.10237  [pdf, other

    cs.CV

    SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

    Authors: Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, Mohammad Rastegari

    Abstract: Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022

  24. arXiv:2206.02680  [pdf, other

    cs.CV cs.AI cs.LG

    Separable Self-attention for Mobile Vision Transformers

    Authors: Sachin Mehta, Mohammad Rastegari

    Abstract: Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Technical report

  25. arXiv:2206.02002  [pdf, other

    cs.CV cs.LG

    CVNets: High Performance Library for Computer Vision

    Authors: Sachin Mehta, Farzad Abdolhosseini, Mohammad Rastegari

    Abstract: We introduce CVNets, a high-performance open-source library for training deep neural networks for visual recognition tasks, including classification, detection, and segmentation. CVNets supports image and video understanding tools, including data loading, data transformations, novel data sampling methods, and implementations of several standard networks with similar or better performance than prev… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: Technical report

  26. arXiv:2112.13192  [pdf

    eess.SP eess.SY

    A Comprehensive Review of Myoelectric Prosthesis Control

    Authors: Mohammad Reza Mohebbian, Marjan Nosouhi, Farzaneh Fazilati, Zahra Nasr Esfahani, Golnaz Amiri, Negar Malekifar, Fatemeh Yusefi, Mohsen Rastegari, Hamid Reza Marateb

    Abstract: Prosthetic hands can be used to support upper-body amputees. Myoelectric prosthesis, one of the externally-powered active prosthesis categories, requires proper processing units in addition to recording electrodes and instrumentation amplifiers. In this paper, the following myoelectric prosthesis control methods were discussed in detail: On-off and finite-state, proportional, direct, and posture,… ▽ More

    Submitted 25 December, 2021; originally announced December 2021.

    Comments: 46 pages

    MSC Class: 92C55

  27. arXiv:2110.04252  [pdf, other

    cs.LG cs.CV

    LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

    Authors: Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari

    Abstract: When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  28. arXiv:2110.03860  [pdf, other

    cs.CV cs.LG

    Token Pooling in Vision Transformers

    Authors: Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel

    Abstract: Despite the recent success in many applications, the high computational requirements of vision transformers limit their use in resource-constrained settings. While many existing methods improve the quadratic complexity of attention, in most vision transformers, self-attention is not the major computation bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers. To impr… ▽ More

    Submitted 11 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023

  29. arXiv:2110.02178  [pdf, other

    cs.CV cs.AI cs.LG

    MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

    Authors: Sachin Mehta, Mohammad Rastegari

    Abstract: Light-weight convolutional neural networks (CNNs) are the de-facto for mobile vision tasks. Their spatial inductive biases allow them to learn representations with fewer parameters across different vision tasks. However, these networks are spatially local. To learn global representations, self-attention-based vision trans-formers (ViTs) have been adopted. Unlike CNNs, ViTs are heavy-weight. In thi… ▽ More

    Submitted 4 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted at ICLR'22

  30. arXiv:2108.12659  [pdf, ps, other

    cs.LG cs.AI cs.CV

    DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

    Authors: Minsik Cho, Keivan A. Vahid, Saurabh Adya, Mohammad Rastegari

    Abstract: Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joi… ▽ More

    Submitted 20 February, 2022; v1 submitted 28 August, 2021; originally announced August 2021.

    Comments: ICLR 2022

  31. arXiv:2102.10472  [pdf, other

    cs.LG cs.CV

    Learning Neural Network Subspaces

    Authors: Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari

    Abstract: Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single… ▽ More

    Submitted 12 September, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

  32. arXiv:2011.09058  [pdf, other

    cs.CV

    Layer-Wise Data-Free CNN Compression

    Authors: Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari

    Abstract: We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data using only a pretrained network. We use this data to perform independent layer-wise compressions on the pretrained network. We also… ▽ More

    Submitted 19 May, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

  33. arXiv:2006.14769  [pdf, other

    cs.LG cs.AI stat.ML

    Supermasks in Superposition

    Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

    Abstract: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not… ▽ More

    Submitted 21 October, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Camera Ready

  34. arXiv:1911.13299  [pdf, other

    cs.CV cs.LG

    What's Hidden in a Randomly Weighted Neural Network?

    Authors: Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari

    Abstract: Training a neural network is synonymous with learning the values of the weights. By contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Hidden in a randomly weighted Wide ResNet-50 we show that there is a subnetwork (with random weights) that is smaller than, but matches the performance of… ▽ More

    Submitted 30 March, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: Accepted to CVPR 2020

  35. arXiv:1911.12385  [pdf, other

    cs.CL cs.LG

    DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling

    Authors: Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi

    Abstract: For sequence models with large vocabularies, a majority of network parameters lie in the input and output layers. In this work, we describe a new method, DeFINE, for learning deep token representations efficiently. Our architecture uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training ti… ▽ More

    Submitted 5 February, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

    Comments: Accepted at ICLR 2020

  36. arXiv:1906.05388  [pdf, other

    cs.CV

    Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors

    Authors: Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi

    Abstract: We present a simple and effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. During network training, we carefully feed in localization information. We excite certain activations in order to help the network learn to better localize. In the later stages of training, we gradually reduce our assisted excitation to zero. We reached a… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  37. arXiv:1906.03516  [pdf, other

    cs.CV cs.LG eess.IV

    DiCENet: Dimension-wise Convolutions for Efficient Networks

    Authors: Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari

    Abstract: We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and… ▽ More

    Submitted 30 November, 2020; v1 submitted 8 June, 2019; originally announced June 2019.

    Comments: Accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  38. arXiv:1906.02256  [pdf, other

    cs.CV cs.LG

    Butterfly Transform: An Efficient FFT Based Neural Architecture Design

    Authors: Keivan Alizadeh Vahid, Anish Prabhu, Ali Farhadi, Mohammad Rastegari

    Abstract: In this paper, we show that extending the butterfly operations from the FFT algorithm to a general Butterfly Transform (BFT) can be beneficial in building an efficient block structure for CNN designs. Pointwise convolutions, which we refer to as channel fusions, are the main computational bottleneck in the state-of-the-art efficient CNNs (e.g. MobileNets ). We introduce a set of criteria for chann… ▽ More

    Submitted 16 April, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

  39. arXiv:1906.00586  [pdf, other

    cs.LG cs.CV cs.NE

    Discovering Neural Wirings

    Authors: Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari

    Abstract: The success of neural networks has driven a shift in focus from feature engineering to architecture engineering. However, successful networks today are constructed using a small and manually defined set of building blocks. Even in methods of neural architecture search (NAS) the network connectivity patterns are largely constrained. In this work we propose a method for discovering neural wirings. W… ▽ More

    Submitted 16 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019 Camera Ready

  40. arXiv:1906.00067  [pdf, other

    cs.CV cs.CL

    OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

    Authors: Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

    Abstract: Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object detection that do not require reasoning or knowledge beyond what is in the image. In this paper, we addre… ▽ More

    Submitted 4 September, 2019; v1 submitted 31 May, 2019; originally announced June 2019.

    Comments: CVPR 2019

  41. arXiv:1904.05879  [pdf, other

    cs.CV cs.AI cs.MA

    Two Body Problem: Collaborative Visual Task Completion

    Authors: Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi

    Abstract: Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is commu… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.

    Comments: Accepted to CVPR 2019

  42. arXiv:1812.05262  [pdf, other

    cs.CV

    ELASTIC: Improving CNNs with Dynamic Scaling Policies

    Authors: Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari

    Abstract: Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). We argue that the scaling policy should be learned from data. In this paper, we introduce ELASTIC, a simple, efficient and yet very effective a… ▽ More

    Submitted 8 April, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 oral, code available https://github.com/allenai/elastic

  43. arXiv:1812.00971  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

    Authors: Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

    Abstract: Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. As we learn a task, we keep learning about it while performing the task. What we learn and how we learn it varies during different stages of learning. Learning how to learn and adapt is a key property that enables us to generalize effortlessly to new settin… ▽ More

    Submitted 26 March, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

  44. arXiv:1811.11431  [pdf, other

    cs.CV

    ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

    Authors: Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi

    Abstract: We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. Our network uses group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on four different tasks: (1) obj… ▽ More

    Submitted 30 March, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: Accepted at CVPR'19

  45. arXiv:1808.09029  [pdf, other

    cs.CL

    Pyramidal Recurrent Unit for Language Modeling

    Authors: Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi

    Abstract: LSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace t… ▽ More

    Submitted 27 August, 2018; originally announced August 2018.

    Comments: Accepted as a long paper in EMNLP 2018

  46. arXiv:1805.02641  [pdf, other

    cs.CV

    Label Refinery: Improving ImageNet Classification through Label Progression

    Authors: Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, Ali Farhadi

    Abstract: Among the three main components (data, labels, and models) of any supervised learning system, data and models have been the main subjects of active research. However, studying labels and their properties has received very little attention. Current principles and paradigms of labeling impose several challenges to machine learning algorithms. Labels are often incomplete, ambiguous, and redundant. In… ▽ More

    Submitted 7 May, 2018; originally announced May 2018.

  47. arXiv:1803.06815  [pdf, other

    cs.CV

    ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

    Authors: Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, Hannaneh Hajishirzi

    Abstract: We introduce a fast and efficient convolutional neural network, ESPNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power. ESPNet is 22 times faster (on a standard GPU) and 180 times smaller than the state-of-the-art semantic se… ▽ More

    Submitted 24 July, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: Accepted at ECCV'18

  48. arXiv:1712.03316  [pdf, other

    cs.CV

    IQA: Visual Question Answering in Interactive Environments

    Authors: Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, Ali Farhadi

    Abstract: We introduce Interactive Question Answering (IQA), the task of answering questions that require an autonomous agent to interact with a dynamic visual environment. IQA presents the agent with a scene and a question, like: "Are there any apples in the fridge?" The agent must navigate around the scene, acquire visual understanding of scene elements, interact with objects (e.g. open refrigerators) and… ▽ More

    Submitted 6 September, 2018; v1 submitted 8 December, 2017; originally announced December 2017.

    Comments: Published in CVPR 2018

  49. arXiv:1611.06473  [pdf, other

    cs.CV

    LCNN: Lookup-based Convolutional Neural Network

    Authors: Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

    Abstract: Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for convolutional neural networks that enables efficient learning and inference. We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is t… ▽ More

    Submitted 12 June, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

    Comments: CVPR 17

  50. arXiv:1609.09220  [pdf, other

    cs.CV

    CNN-aware Binary Map for General Semantic Segmentation

    Authors: Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Mohammad Rastegari, Carlo Regazzoni

    Abstract: In this paper we introduce a novel method for general semantic segmentation that can benefit from general semantics of Convolutional Neural Network (CNN). Our segmentation proposes visually and semantically coherent image segments. We use binary encoding of CNN features to overcome the difficulty of the clustering on the high-dimensional CNN feature space. These binary codes are very robust agains… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.

    Comments: ICIP 2016 Best Paper / Student Paper Finalist