Skip to main content

Showing 1–13 of 13 results for author: Zettlemoyer, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.18969  [pdf, other

    cs.LG cs.AI cs.CL stat.ME

    (Mis)Fitting: A Survey of Scaling Laws

    Authors: Margaret Li, Sneha Kudugunta, Luke Zettlemoyer

    Abstract: Modern foundation models rely heavily on using scaling laws to guide crucial training decisions. Researchers often extrapolate the optimal architecture and hyper parameters settings from smaller training runs by describing the relationship between, loss, or task performance, and scale. All components of this process vary, from the specific equation being fit, to the training setup, to the optimiza… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 41 pages, 3 figure, first two authors contributed equally. ICLR, 2025

  2. arXiv:2412.01339  [pdf, other

    cs.CV cs.AI cs.GR cs.LG stat.ML

    Negative Token Merging: Image-based Adversarial Feature Guidance

    Authors: Jaskirat Singh, Lindsey Li, Weijia Shi, Ranjay Krishna, Yejin Choi, Pang Wei Koh, Michael F. Cohen, Stephen Gould, Liang Zheng, Luke Zettlemoyer

    Abstract: Text-based adversarial guidance using a negative prompt has emerged as a widely adopted approach to steer diffusion models away from producing undesired concepts. While useful, performing adversarial guidance using text alone can be insufficient to capture complex visual concepts or avoid specific visual elements like copyrighted characters. In this paper, for the first time we explore an alternat… ▽ More

    Submitted 5 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

  3. arXiv:2411.05877  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

    Authors: Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng

    Abstract: Large language models (LMs) are typically adapted to improve performance on new contexts (\eg text prompts that define new tasks or domains) through fine-tuning or prompting. However, there is an accuracy compute tradeoff -- fine-tuning incurs significant training cost and prompting increases inference overhead. We introduce $GenerativeAdapter$, an effective and efficient adaptation method that di… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

  4. arXiv:2103.12528  [pdf, other

    cs.CL cs.AI stat.ML

    Multilingual Autoregressive Entity Linking

    Authors: Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

    Abstract: We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 20 pages, 8 figures, and 11 tables

  5. arXiv:2008.03156  [pdf, other

    cs.LG cs.CL stat.ML

    Better Fine-Tuning by Reducing Representational Collapse

    Authors: Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

    Abstract: Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

  6. arXiv:2006.15020  [pdf, other

    cs.CL cs.LG stat.ML

    Pre-training via Paraphrasing

    Authors: Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

    Abstract: We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of genera… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  7. arXiv:2004.01655  [pdf, other

    cs.CL cs.LG stat.ML

    Aligned Cross Entropy for Non-Autoregressive Machine Translation

    Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

    Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propos… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  8. arXiv:2001.08785  [pdf, other

    cs.CL cs.LG stat.ML

    Semi-Autoregressive Training Improves Mask-Predict Decoding

    Authors: Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer

    Abstract: The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  9. arXiv:1910.13461  [pdf, other

    cs.CL cs.LG stat.ML

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

    Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  10. arXiv:1907.04840  [pdf, other

    cs.LG cs.NE stat.ML

    Sparse Networks from Scratch: Faster Training without Losing Performance

    Authors: Tim Dettmers, Luke Zettlemoyer

    Abstract: We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently. Sparse momentum… ▽ More

    Submitted 23 August, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: 9 page NeurIPS 2019 submission

  11. arXiv:1904.09324  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Mask-Predict: Parallel Decoding of Conditional Masked Language Models

    Authors: Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

    Abstract: Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively,… ▽ More

    Submitted 4 September, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: EMNLP 2019

  12. arXiv:1805.03716  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

    Authors: Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

    Abstract: LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM's gates from the embedded simple RNN, producing a n… ▽ More

    Submitted 9 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  13. arXiv:1210.4889  [pdf

    cs.LG cs.AI stat.ML

    Learning STRIPS Operators from Noisy and Incomplete Observations

    Authors: Kira Mourao, Luke S. Zettlemoyer, Ronald P. A. Petrick, Mark Steedman

    Abstract: Agents learning to act autonomously in real-world domains must acquire a model of the dynamics of the domain in which they operate. Learning domain dynamics can be challenging, especially where an agent only has partial access to the world state, and/or noisy external sensors. Even in standard STRIPS domains, existing approaches cannot learn from noisy, incomplete observations typical of real-worl… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-614-623