Skip to main content

Showing 1–31 of 31 results for author: Aghajanyan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.16326  [pdf, other

    cs.CV cs.LG

    When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization

    Authors: Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi

    Abstract: Current image generation methods, such as latent diffusion and discrete token-based generation, depend on a two-stage training approach. In stage 1, an auto-encoder is trained to compress an image into a latent space; in stage 2, a generative model is trained to learn a distribution over that latent space. Most work focuses on maximizing stage 1 performance independent of stage 2, assuming better… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  2. arXiv:2407.21770  [pdf, other

    cs.AI cs.LG

    MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

    Authors: Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srinivasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettlemoyer, Armen Aghajanyan

    Abstract: We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adap… ▽ More

    Submitted 12 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: v2 -> update related work section v3 -> fix spelling

  3. arXiv:2407.18897  [pdf, other

    cs.LG cs.NE q-bio.QM

    Small Molecule Optimization with Large Language Models

    Authors: Philipp Guevorguian, Menua Bedrosian, Tigran Fahradyan, Gayane Chilingaryan, Hrant Khachatrian, Armen Aghajanyan

    Abstract: Recent advancements in large language models have opened new possibilities for generative molecular drug design. We present Chemlactica and Chemma, two language models fine-tuned on a novel corpus of 110M molecules with computed properties, totaling 40B tokens. These models demonstrate strong performance in generating molecules with specified properties and predicting new molecular characteristics… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  4. arXiv:2405.01582  [pdf, other

    cs.CL cs.AI cs.LG

    Text Quality-Based Pruning for Efficient Training of Language Models

    Authors: Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Shang-Wen Li, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer

    Abstract: In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets in a model agnostic manner to assign the text instances a "quality score". By proposing the text quality metric, th… ▽ More

    Submitted 10 May, 2024; v1 submitted 26 April, 2024; originally announced May 2024.

  5. arXiv:2310.02804  [pdf, other

    cs.CL cs.CV cs.LG

    DOMINO: A Dual-System for Multi-step Visual Language Reasoning

    Authors: Peifang Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz, Maryam Fazel-Zarandi

    Abstract: Visual language reasoning requires a system to extract text or numbers from information-dense images like charts or plots and perform logical or arithmetic reasoning to arrive at an answer. To tackle this task, existing work relies on either (1) an end-to-end vision-language model trained on a large amount of data, or (2) a two-stage pipeline where a captioning model converts the image into text t… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  6. arXiv:2309.15564  [pdf, other

    cs.LG cs.CL cs.CV

    Jointly Training Large Autoregressive Multimodal Models

    Authors: Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, Barlas Oguz

    Abstract: In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that… ▽ More

    Submitted 28 September, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

  7. arXiv:2309.02591  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

    Authors: Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz , et al. (2 additional authors not shown)

    Abstract: We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted fr… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  8. arXiv:2308.12284  [pdf, other

    cs.CL cs.AI cs.LG

    D4: Improving LLM Pretraining via Document De-Duplication and Diversification

    Authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos

    Abstract: Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on ever-larger portions of the internet leads to consistent performance improvements, the size of these improvements diminishes with scale, and there ha… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  9. arXiv:2305.07185  [pdf, other

    cs.LG

    MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

    Authors: Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

    Abstract: Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a glo… ▽ More

    Submitted 19 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  10. arXiv:2301.03728  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Laws for Generative Mixed-Modal Language Models

    Authors: Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

    Abstract: Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modaliti… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  11. arXiv:2211.16349  [pdf, other

    cs.LG q-bio.BM

    BARTSmiles: Generative Masked Language Models for Molecular Representations

    Authors: Gayane Chilingaryan, Hovhannes Tamoyan, Ani Tevosyan, Nelly Babayan, Lusine Khondkaryan, Karen Hambardzumyan, Zaven Navoyan, Hrant Khachatrian, Armen Aghajanyan

    Abstract: We discover a robust self-supervised strategy tailored towards molecular representations for generative masked language models through a series of tailored, in-depth ablations. Using this pre-training strategy, we train BARTSmiles, a BART-like model with an order of magnitude more compute than previous self-supervised molecular representations. In-depth evaluations show that BARTSmiles consistentl… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 27 pages (including appendix)

  12. arXiv:2211.12561  [pdf, other

    cs.CV cs.CL cs.LG

    Retrieval-Augmented Multimodal Language Modeling

    Authors: Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

    Abstract: Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all learned knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a… ▽ More

    Submitted 5 June, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Published at ICML 2023. Blog post available at https://cs.stanford.edu/~myasu/blog/racm3/

  13. arXiv:2205.10770  [pdf, other

    cs.CL

    Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

    Authors: Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan

    Abstract: Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize… ▽ More

    Submitted 2 November, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  14. arXiv:2204.07496  [pdf, other

    cs.CL cs.IR

    Improving Passage Retrieval with Zero-Shot Question Generation

    Authors: Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

    Abstract: We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or… ▽ More

    Submitted 2 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022 camera-ready version. Code is available at: https://github.com/DevSinghSachan/unsupervised-passage-reranking

  15. arXiv:2204.05999  [pdf, other

    cs.SE cs.CL cs.LG

    InCoder: A Generative Model for Code Infilling and Synthesis

    Authors: Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

    Abstract: Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and move… ▽ More

    Submitted 9 April, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: ICLR 2023. v3: camera-ready that includes PLBART and OpenAI baselines

  16. arXiv:2201.07520  [pdf, other

    cs.CL

    CM3: A Causal Masked Multimodal Model of the Internet

    Authors: Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

    Abstract: We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking obje… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  17. arXiv:2110.07731  [pdf, other

    cs.CL cs.LG

    CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

    Authors: Patrick Huber, Armen Aghajanyan, Barlas Oğuz, Dmytro Okhonko, Wen-tau Yih, Sonal Gupta, Xilun Chen

    Abstract: With the rise of large-scale pre-trained language models, open-domain question-answering (ODQA) has become an important research topic in NLP. Based on the popular pre-training fine-tuning approach, we posit that an additional in-domain pre-training stage using a large-scale, natural, and diverse question-answering (QA) dataset can be beneficial for ODQA. Consequently, we propose a novel QA datase… ▽ More

    Submitted 2 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 9 pages, Findings of NAACL 2022

  18. arXiv:2109.14084  [pdf, other

    cs.CV cs.CL

    VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

    Authors: Hu Xu, Gargi Ghosh, Po-Yao Huang, Dmytro Okhonko, Armen Aghajanyan, Florian Metze, Luke Zettlemoyer, Christoph Feichtenhofer

    Abstract: We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks. VideoCLIP trains a transformer for video and text by contrasting temporally overlapping positive video-text pairs with hard negatives from nearest neighbor retrieval. Our experiments on a diverse series of downstream tasks, including se… ▽ More

    Submitted 1 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  19. arXiv:2109.10410  [pdf, other

    cs.CL cs.IR cs.LG

    RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing

    Authors: Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov

    Abstract: While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering. In this paper, we are applying retrieval-based modeling ideas to the problem of multi-domain t… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: 12 pages, 9 figures, 5 Tables

  20. arXiv:2107.06955  [pdf, ps, other

    cs.CL cs.LG

    HTLM: Hyper-Text Pre-Training and Prompting of Language Models

    Authors: Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer

    Abstract: We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  21. arXiv:2104.04923  [pdf, other

    cs.CL cs.LG

    Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

    Authors: Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad

    Abstract: Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic pars… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

  22. arXiv:2101.11038  [pdf, other

    cs.CL cs.LG

    Muppet: Massive Multi-task Representations with Pre-Finetuning

    Authors: Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

    Abstract: We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance f… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  23. arXiv:2012.13255  [pdf, other

    cs.LG cs.CL

    Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

    Authors: Armen Aghajanyan, Luke Zettlemoyer, Sonal Gupta

    Abstract: Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime. Why can we use relatively vanilla gradient descent algorithms (e.g., without strong regularization) to tune a model with hundreds of millions of parameters on dataset… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  24. arXiv:2009.13655  [pdf, other

    cs.CL cs.LG

    Conversational Semantic Parsing

    Authors: Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta

    Abstract: The structured representation for semantic parsing in task-oriented assistant systems is geared towards simple understanding of one-turn queries. Due to the limitations of the representation, the session-based properties such as co-reference resolution and context carryover are processed downstream in a pipelined system. In this paper, we propose a semantic representation for such task-oriented co… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

  25. arXiv:2008.03156  [pdf, other

    cs.LG cs.CL stat.ML

    Better Fine-Tuning by Reducing Representational Collapse

    Authors: Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

    Abstract: Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  26. arXiv:2006.15020  [pdf, other

    cs.CL cs.LG stat.ML

    Pre-training via Paraphrasing

    Authors: Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

    Abstract: We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of genera… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  27. arXiv:1809.08510  [pdf, other

    cs.CL cs.LG stat.ML

    Towards Language Agnostic Universal Representations

    Authors: Armen Aghajanyan, Xia Song, Saurabh Tiwary

    Abstract: When a bilingual student learns to solve word problems in math, we expect the student to be able to solve these problem in both languages the student is fluent in,even if the math lessons were only taught in one language. However, current representations in machine learning are language dependent. In this work, we present a method to decouple the language from the problem by learning language agno… ▽ More

    Submitted 22 September, 2018; originally announced September 2018.

  28. arXiv:1702.06295  [pdf, other

    cs.LG stat.ML

    Convolution Aware Initialization

    Authors: Armen Aghajanyan

    Abstract: Initialization of parameters in deep neural networks has been shown to have a big impact on the performance of the networks (Mishkin & Matas, 2015). The initialization scheme devised by He et al, allowed convolution activations to carry a constrained mean which allowed deep networks to be trained effectively (He et al., 2015a). Orthogonal initializations and more generally orthogonal matrices in s… ▽ More

    Submitted 27 February, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

  29. arXiv:1609.09522  [pdf

    cs.LG

    Charged Point Normalization: An Efficient Solution to the Saddle Point Problem

    Authors: Armen Aghajanyan

    Abstract: Recently, the problem of local minima in very high dimensional non-convex optimization has been challenged and the problem of saddle points has been introduced. This paper introduces a dynamic type of normalization that forces the system to escape saddle points. Unlike other saddle point escaping algorithms, second order information is not utilized, and the system can be trained with an arbitrary… ▽ More

    Submitted 7 February, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

  30. arXiv:1609.06693  [pdf, other

    cs.LG

    SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks

    Authors: Armen Aghajanyan

    Abstract: Deep neural networks are learning models with a very high capacity and therefore prone to over-fitting. Many regularization techniques such as Dropout, DropConnect, and weight decay all attempt to solve the problem of over-fitting by reducing the capacity of their respective models (Srivastava et al., 2014), (Wan et al., 2013), (Krogh & Hertz, 1992). In this paper we introduce a new form of regula… ▽ More

    Submitted 3 December, 2016; v1 submitted 21 September, 2016; originally announced September 2016.

  31. arXiv:1509.01659  [pdf, ps, other

    cs.LG

    Gravitational Clustering

    Authors: Armen Aghajanyan

    Abstract: The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data. Although there is a lot of buzz about big data, there is still the problem of doing classification from a small dataset. Other methods such as support vector machines, although capable of dealing with few samples, are inherently binary classifiers, and are in need… ▽ More

    Submitted 4 September, 2015; originally announced September 2015.