Skip to main content

Showing 1–1 of 1 results for author: Chahed, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.05355  [pdf, other

    cs.CL cs.AI

    Falcon Mamba: The First Competitive Attention-free 7B Language Model

    Authors: Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid

    Abstract: In this technical report, we present Falcon Mamba 7B, a new base large language model based on the novel Mamba architecture. Falcon Mamba 7B is trained on 5.8 trillion tokens with carefully selected data mixtures. As a pure Mamba-based model, Falcon Mamba 7B surpasses leading open-weight models based on Transformers, such as Mistral 7B, Llama3.1 8B, and Falcon2 11B. It is on par with Gemma 7B and… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.