Skip to main content

Showing 1–17 of 17 results for author: Levy, O

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.08570  [pdf, other

    cs.LG stat.ML

    Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

    Authors: Asaf Cassel, Orin Levy, Yishay Mansour

    Abstract: Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL). Most works achieve this by carefully estimating the model uncertainty and following the so-called optimistic model. Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that provably achieves near-optimal regret for stochastic… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  2. arXiv:2305.14196  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

    Authors: Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, Omer Levy

    Abstract: We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test and small validation sets, without training data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive eva… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP 2023

  3. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  4. arXiv:2202.00206  [pdf

    cs.HC eess.SP q-bio.QM stat.AP

    A pilot study of the Earable device to measure facial muscle and eye movement tasks among healthy volunteers

    Authors: Matthew F. Wipperman, Galen Pogoncheff, Katrina F. Mateo, Xuefang Wu, Yiziying Chen, Oren Levy, Andreja Avbersek, Robin R. Deterding, Sara C. Hamon, Tam Vu, Rinol Alaj, Olivier Harari

    Abstract: Many neuromuscular disorders impair function of cranial nerve enervated muscles. Clinical assessment of cranial muscle function has several limitations. Clinician rating of symptoms suffers from inter-rater variation, qualitative or semi-quantitative scoring, and limited ability to capture infrequent or fluctuating symptoms. Patient-reported outcomes are limited by recall bias and poor precision.… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  5. arXiv:2201.03533  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    SCROLLS: Standardized CompaRison Over Long Language Sequences

    Authors: Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

    Abstract: NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing infor… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: EMNLP 2022

  6. arXiv:2103.01242  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

    Authors: Avia Efrat, Uri Shaham, Dan Kilman, Omer Levy

    Abstract: Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and… ▽ More

    Submitted 1 November, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: EMNLP 2021

  7. arXiv:2008.09396  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Machine Translation without Embeddings

    Authors: Uri Shaham, Omer Levy

    Abstract: Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8, obviating the need for an embedding layer since there are fewer token types (256) than dimensions. Surprisingly, replacing the ubiquitous embedding la… ▽ More

    Submitted 12 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: NAACL 2021

  8. arXiv:2004.01655  [pdf, other

    cs.CL cs.LG stat.ML

    Aligned Cross Entropy for Non-Autoregressive Machine Translation

    Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

    Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propos… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  9. arXiv:2001.08785  [pdf, other

    cs.CL cs.LG stat.ML

    Semi-Autoregressive Training Improves Mask-Predict Decoding

    Authors: Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer

    Abstract: The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  10. arXiv:1910.13461  [pdf, other

    cs.CL cs.LG stat.ML

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

    Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  11. arXiv:1910.00577  [pdf, other

    cs.LG cs.PL stat.ML

    Structural Language Models of Code

    Authors: Uri Alon, Roy Sadaka, Omer Levy, Eran Yahav

    Abstract: We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstrac… ▽ More

    Submitted 29 July, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: Appeared in ICML'2020

  12. arXiv:1904.09324  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Mask-Predict: Parallel Decoding of Conditional Masked Language Models

    Authors: Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

    Abstract: Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively,… ▽ More

    Submitted 4 September, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: EMNLP 2019

  13. arXiv:1902.01509  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

    Authors: Vladimir Karpukhin, Omer Levy, Jacob Eisenstein, Marjan Ghazvininejad

    Abstract: We consider the problem of making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatica… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  14. arXiv:1808.01400  [pdf, other

    cs.LG cs.PL stat.ML

    code2seq: Generating Sequences from Structured Representations of Code

    Authors: Uri Alon, Shaked Brody, Omer Levy, Eran Yahav

    Abstract: The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present ${\rm {\scriptsize CODE2SEQ}}$:… ▽ More

    Submitted 21 February, 2019; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: Accepted to ICLR'2019

  15. arXiv:1805.03716  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

    Authors: Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

    Abstract: LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM's gates from the embedded simple RNN, producing a n… ▽ More

    Submitted 9 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  16. arXiv:1803.09473  [pdf, other

    cs.LG cs.AI cs.PL stat.ML

    code2vec: Learning Distributed Representations of Code

    Authors: Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav

    Abstract: We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $\textit{code vector}$, which can be used to predict semantic properties of the snippet. This is performed by decomposing code to a collection of paths in its abstract syntax tree, and learning the atomic representa… ▽ More

    Submitted 30 October, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: Accepted in POPL 2019

  17. arXiv:1402.3722  [pdf, ps, other

    cs.CL cs.LG stat.ML

    word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

    Authors: Yoav Goldberg, Omer Levy

    Abstract: The word2vec software of Tomas Mikolov and colleagues (https://code.google.com/p/word2vec/ ) has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers. We found the description of the models in these papers to be somewhat cryptic and hard to follow. While the motivations and presentation may be o… ▽ More

    Submitted 15 February, 2014; originally announced February 2014.