Search | arXiv e-print repository

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project

Authors: Carolin Penke, Chelsea Maria John, Jan Ebert, Stefan Kesselheim, Andreas Herten

Abstract: The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. This report presents best practices and insights gained from the OpenGPT-X project, a German initiative focused on developing open, multilingual LLMs optimized for European languages. We detail the use of high-pe… ▽ More The training of large language models (LLMs) requires substantial computational resources, complex software stacks, and carefully designed workflows to achieve scalability and efficiency. This report presents best practices and insights gained from the OpenGPT-X project, a German initiative focused on developing open, multilingual LLMs optimized for European languages. We detail the use of high-performance computing (HPC) systems, primarily JUWELS Booster at JSC, for training Teuken-7B, a 7-billion-parameter transformer model. The report covers system architecture, training infrastructure, software choices, profiling and benchmarking tools, as well as engineering and operational challenges. △ Less

Submitted 14 April, 2025; originally announced April 2025.

ACM Class: C.4; I.2.11; I.2.7; K.6

arXiv:2409.12994 [pdf, other]

doi 10.1109/SCW63240.2024.00158

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML

Authors: Chelsea Maria John, Stepan Nassyr, Carolin Penke, Andreas Herten

Abstract: The rapid advancement of machine learning (ML) technologies has driven the development of specialized hardware accelerators designed to facilitate more efficient model training. This paper introduces the CARAML benchmark suite, which is employed to assess performance and energy consumption during the training of transformer-based large language models and computer vision models on a range of hardw… ▽ More The rapid advancement of machine learning (ML) technologies has driven the development of specialized hardware accelerators designed to facilitate more efficient model training. This paper introduces the CARAML benchmark suite, which is employed to assess performance and energy consumption during the training of transformer-based large language models and computer vision models on a range of hardware accelerators, including systems from NVIDIA, AMD, and Graphcore. CARAML provides a compact, automated, extensible, and reproducible framework for assessing the performance and energy of ML workloads across various novel hardware architectures. The design and implementation of CARAML, along with a custom power measurement tool called jpwr, are discussed in detail. △ Less

Submitted 29 October, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

Comments: To be published in Workshop Proceedings of The International Conference for High Performance Computing Networking, Storage, and Analysis (SC-W '24) (2024)

arXiv:2408.17211 [pdf, other]

doi 10.1109/SC41406.2024.00038

Application-Driven Exascale: The JUPITER Benchmark Suite

Authors: Andreas Herten, Sebastian Achilles, Damian Alvarez, Jayesh Badwaik, Eric Behle, Mathis Bode, Thomas Breuer, Daniel Caviedes-Voullième, Mehdi Cherti, Adel Dabah, Salem El Sayed, Wolfgang Frings, Ana Gonzalez-Nicolas, Eric B. Gregory, Kaveh Haghighi Mood, Thorsten Hater, Jenia Jitsev, Chelsea Maria John, Jan H. Meinke, Catrin I. Meyer, Pavel Mezentsev, Jan-Oliver Mirus, Stepan Nassyr, Carolin Penke, Manoel Römmer , et al. (6 additional authors not shown)

Abstract: Benchmarks are essential in the design of modern HPC installations, as they define key aspects of system components. Beyond synthetic workloads, it is crucial to include real applications that represent user requirements into benchmark suites, to guarantee high usability and widespread adoption of a new system. Given the significant investments in leadership-class supercomputers of the exascale er… ▽ More Benchmarks are essential in the design of modern HPC installations, as they define key aspects of system components. Beyond synthetic workloads, it is crucial to include real applications that represent user requirements into benchmark suites, to guarantee high usability and widespread adoption of a new system. Given the significant investments in leadership-class supercomputers of the exascale era, this is even more important and necessitates alignment with a vision of Open Science and reproducibility. In this work, we present the JUPITER Benchmark Suite, which incorporates 16 applications from various domains. It was designed for and used in the procurement of JUPITER, the first European exascale supercomputer. We identify requirements and challenges and outline the project and software infrastructure setup. We provide descriptions and scalability studies of selected applications and a set of key takeaways. The JUPITER Benchmark Suite is released as open source software with this work at https://github.com/FZJ-JSC/jubench. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: To be published in Proceedings of The International Conference for High Performance Computing Networking, Storage, and Analysis (SC '24) (2024)

ACM Class: B.8.2; C.0; C.5.1; D.1.0; C.4

Journal ref: 2024 SC24: International Conference for High Performance Computing, Networking, Storage and Analysis SC

arXiv:2403.17757 [pdf, other]

Noise2Noise Denoising of CRISM Hyperspectral Data

Authors: Robert Platt, Rossella Arcucci, Cédric M. John

Abstract: Hyperspectral data acquired by the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) have allowed for unparalleled mapping of the surface mineralogy of Mars. Due to sensor degradation over time, a significant portion of the recently acquired data is considered unusable. Here a new data-driven model architecture, Noise2Noise4Mars (N2N4M), is introduced to remove noise from CRISM images.… ▽ More Hyperspectral data acquired by the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) have allowed for unparalleled mapping of the surface mineralogy of Mars. Due to sensor degradation over time, a significant portion of the recently acquired data is considered unusable. Here a new data-driven model architecture, Noise2Noise4Mars (N2N4M), is introduced to remove noise from CRISM images. Our model is self-supervised and does not require zero-noise target data, making it well suited for use in Planetary Science applications where high quality labelled data is scarce. We demonstrate its strong performance on synthetic-noise data and CRISM images, and its impact on downstream classification performance, outperforming benchmark methods on most metrics. This allows for detailed analysis for critical sites of interest on the Martian surface, including proposed lander sites. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures. Accepted as a conference paper at the ICLR 2024 ML4RS Workshop

Showing 1–4 of 4 results for author: John, C M