Skip to main content

Showing 1–5 of 5 results for author: Jannai, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12570  [pdf, other

    cs.CL cs.LG

    Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

    Authors: Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar , et al. (36 additional authors not shown)

    Abstract: We present Jamba-1.5, new instruction-tuned large language models based on our Jamba architecture. Jamba is a hybrid Transformer-Mamba mixture of experts architecture, providing high throughput and low memory usage across context lengths, while retaining the same or better quality as Transformer models. We release two model sizes: Jamba-1.5-Large, with 94B active parameters, and Jamba-1.5-Mini, wi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Webpage: https://www.ai21.com/jamba

  2. arXiv:2305.20010  [pdf, other

    cs.AI cs.CL cs.CY cs.HC

    Human or Not? A Gamified Approach to the Turing Test

    Authors: Daniel Jannai, Amos Meron, Barak Lenz, Yoav Levine, Yoav Shoham

    Abstract: We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like hum… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 11 pages, 6 figures

    MSC Class: 68T50 ACM Class: I.2.7

  3. arXiv:2204.10019  [pdf, other

    cs.CL cs.AI

    Standing on the Shoulders of Giant Frozen Language Models

    Authors: Yoav Levine, Itay Dalmedigos, Ori Ram, Yoel Zeldes, Daniel Jannai, Dor Muhlgay, Yoni Osin, Opher Lieber, Barak Lenz, Shai Shalev-Shwartz, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

    Abstract: Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-t… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  4. arXiv:2110.04541  [pdf, other

    cs.CL cs.LG

    The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

    Authors: Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon Shashua

    Abstract: Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can… ▽ More

    Submitted 21 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  5. arXiv:2105.03928  [pdf, other

    cs.LG cs.CL

    Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

    Authors: Noam Wies, Yoav Levine, Daniel Jannai, Amnon Shashua

    Abstract: After their successful debut in natural language processing, Transformer architectures are now becoming the de-facto standard in many domains. An obstacle for their deployment over new modalities is the architectural configuration: the optimal depth-to-width ratio has been shown to dramatically vary across data types (e.g., $10$x larger over images than over language). We theoretically predict the… ▽ More

    Submitted 9 June, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: ICML 2021