Skip to main content

Showing 1–11 of 11 results for author: Weinbach, S

.
  1. arXiv:2410.03730  [pdf, other

    cs.CL cs.AI cs.LG

    Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs

    Authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber, Richard Rutmann, Charvi Jain, Max Lübbering, Daniel Steinigen, Johannes Leveling, Katrin Klug, Jasper Schulze Buschhoff, Lena Jurkschat, Hammam Abdelwahab, Benny Jörg Stein, Karl-Heinz Sylla, Pavel Denisov, Nicolo' Brandizzi, Qasid Saleem, Anirban Bhowmick, Lennard Helmer, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Alex Jude , et al. (14 additional authors not shown)

    Abstract: We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset comprising around 60% non-English data and utilizing a custom multilingual tokenizer, our models address the limitations of existing LLMs that predominantly focus on English or a few high-resource languages. We detail the models' dev… ▽ More

    Submitted 15 October, 2024; v1 submitted 30 September, 2024; originally announced October 2024.

  2. arXiv:2407.17465  [pdf, other

    cs.LG

    u-$μ$P: The Unit-Scaled Maximal Update Parametrization

    Authors: Charlie Blake, Constantin Eichenberg, Josef Dean, Lukas Balles, Luke Y. Prince, Björn Deiseroth, Andres Felipe Cruz-Salinas, Carlo Luschi, Samuel Weinbach, Douglas Orr

    Abstract: The Maximal Update Parametrization ($μ$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$μ$P, which improves upon $μ$P by combining it with Unit Scaling, a method for designing models that makes them easy to train in low-precision. The two tech… ▽ More

    Submitted 10 January, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: 55 pages

  3. arXiv:2406.19223  [pdf, other

    cs.CL cs.AI cs.LG

    T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

    Authors: Björn Deiseroth, Manuel Brack, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

    Abstract: Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessarily large embedding and head layers. Additionally, their performance is biased towards a reference corpus, leading to reduced effectiveness for underr… ▽ More

    Submitted 7 January, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2311.05610  [pdf, other

    cs.LG cs.DC

    Efficient Parallelization Layouts for Large-Scale Distributed Model Training

    Authors: Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo

    Abstract: Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence paralle… ▽ More

    Submitted 24 September, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Camera-ready version for the First Conference on Language Modeling (COLM 2024)

  5. arXiv:2310.08754  [pdf, other

    cs.LG

    Tokenizer Choice For LLM Training: Negligible or Crucial?

    Authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr

    Abstract: The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot. Shedding light on this underexplored area, we conduct a comprehensive study on the influence of tokenizer choice on LLM downstream perf… ▽ More

    Submitted 17 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

  6. arXiv:2305.15296  [pdf, other

    cs.CV cs.AI cs.LG

    MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

    Authors: Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

    Abstract: The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that all… ▽ More

    Submitted 20 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Proceedings of Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems (NeurIPS)

  7. arXiv:2301.08110  [pdf, other

    cs.LG cs.AI

    AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

    Authors: Björn Deiseroth, Mayukh Deb, Samuel Weinbach, Manuel Brack, Patrick Schramowski, Kristian Kersting

    Abstract: Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they require prohibitively large amounts of extra memory, since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass… ▽ More

    Submitted 7 January, 2025; v1 submitted 19 January, 2023; originally announced January 2023.

  8. arXiv:2212.02936  [pdf, other

    cs.CV

    M-VADER: A Model for Diffusion with Multimodal Context

    Authors: Samuel Weinbach, Marco Bellagente, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Björn Deiseroth, Koen Oostermeijer, Hannah Teufel, Andres Felipe Cruz-Salinas

    Abstract: We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to s… ▽ More

    Submitted 7 December, 2022; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 22 pages, 14 figures, 2 tables, fixed figure 3

  9. arXiv:2204.06745  [pdf, other

    cs.CL

    GPT-NeoX-20B: An Open-Source Autoregressive Language Model

    Authors: Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

    Abstract: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and trainin… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: To appear in the Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models

  10. arXiv:2112.05253  [pdf, other

    cs.CV cs.CL

    MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

    Authors: Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank

    Abstract: Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of… ▽ More

    Submitted 24 October, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: 13 pages, 6 figures, 2 tables. Minor improvements. Accepted at EMNLP 2022

    ACM Class: I.2.7; I.4.8; I.5.1

  11. arXiv:2011.06665  [pdf, other

    cs.AI

    Domain-Level Explainability -- A Challenge for Creating Trust in Superhuman AI Strategies

    Authors: Jonas Andrulis, Ole Meyer, Grégory Schott, Samuel Weinbach, Volker Gruhn

    Abstract: For strategic problems, intelligent systems based on Deep Reinforcement Learning (DRL) have demonstrated an impressive ability to learn advanced solutions that can go far beyond human capabilities, especially when dealing with complex scenarios. While this creates new opportunities for the development of intelligent assistance systems with groundbreaking functionalities, applying this technology t… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.