Skip to main content

Showing 1–3 of 3 results for author: Keiblinger, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.14065  [pdf, ps, other

    cs.DC

    Prime Collective Communications Library -- Technical Report

    Authors: Michael Keiblinger, Mario Sieg, Jack Min Ong, Sami Jaghouar, Johannes Hagemann

    Abstract: This report presents the Prime Collective Communications Library (PCCL), a novel fault-tolerant collective communication library designed for distributed ML workloads over the public internet. PCCL introduces a new programming model that enables dynamic peer joining and failure recovery. The library implements efficient collective operations like all-reduce while providing robust fault tolerance m… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 31 pages, 5 figures

  2. arXiv:2505.07291  [pdf, ps, other

    cs.LG cs.DC

    INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

    Authors: Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann

    Abstract: We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors. To enable a training run with this unique infrastructure, we built… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 26 pages, 12 figures

  3. arXiv:2412.01152  [pdf, other

    cs.DC

    INTELLECT-1 Technical Report

    Authors: Sami Jaghouar, Jack Min Ong, Manveer Basra, Fares Obeid, Jannik Straube, Michael Keiblinger, Elie Bakouch, Lucas Atkins, Maziyar Panahi, Charles Goddard, Max Ryabinin, Johannes Hagemann

    Abstract: In this report, we introduce INTELLECT-1, the first 10 billion parameter language model collaboratively trained across the globe, demonstrating that large-scale model training is no longer confined to large corporations but can be achieved through a distributed, community-driven approach. INTELLECT-1 was trained on 1 trillion tokens using up to 14 concurrent nodes distributed across 3 continents,… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 19 pages, 6 figures