Skip to main content

Showing 1–1 of 1 results for author: Blázquez, G M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.02737  [pdf, other

    cs.CL

    SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

    Authors: Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíček, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Werra, Thomas Wolf

    Abstract: While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this paper, we document the development of SmolLM2, a state-of-the-art "small" (1.7 billion parameter) language model (LM). To attain strong performance, we overtrain… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.