-
Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++
Authors:
Giulio Malenza,
Giovanni Stabile,
Filippo Spiga,
Robert Birke,
Marco Aldinucci
Abstract:
The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading manufacturers such as NVIDIA, Intel, and AMD are constantly advancing these architectures, augmenting them with features such as mixed precision, enhanced memory hie…
▽ More
The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading manufacturers such as NVIDIA, Intel, and AMD are constantly advancing these architectures, augmenting them with features such as mixed precision, enhanced memory hierarchies, and specialised accelerator silicon blocks (e.g., Tensor Cores on GPU or AMX/SME engines on CPU) to enhance compute performance. At the same time, significant efforts in software development are aimed at optimizing the use of these innovations, seeking to improve usability and accessibility. This work contributes to the state-of-the-art of OpenFOAM development by presenting a working Proof-Of-Concept application built using modern ISO C++ parallel constructs. This approach, combined with an appropriate compiler runtime stack, like the one provided by the NVIDIA HPC SDK, makes it possible to accelerate well-defined kernels, allowing multi-core execution and GPU offloading using a single codebase. The study demonstrates that it is possible to increase the performance of the OpenFOAM laplacianFoam application by offloading the computations on NVIDIA GPUs using the C++ parallel construct.
△ Less
Submitted 24 July, 2025;
originally announced July 2025.
-
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects
Authors:
Daniele De Sensi,
Lorenzo Pichetti,
Flavio Vella,
Tiziano De Matteis,
Zebin Ren,
Luigi Fusco,
Matteo Turisini,
Daniele Cesarini,
Kurt Lust,
Animesh Trivedi,
Duncan Roweth,
Filippo Spiga,
Salvatore Di Girolamo,
Torsten Hoefler
Abstract:
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This pape…
▽ More
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software optimization.
△ Less
Submitted 15 November, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Design of an energy aware petaflops class high performance cluster based on power architecture
Authors:
W. A. Ahmad,
A. Bartolini,
F. Beneventi,
L. Benini,
A. Borghesi,
M. Cicala,
P. Forestieri,
C. Gianfreda,
D. Gregori,
A. Libri,
F. Spiga,
S. Tinti
Abstract:
In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 10…
▽ More
In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM's POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100 Gb/s networking) plus custom hardware and an innovative system middleware software. D.A.V.I.D.E. features (i) a dedicated power monitor interface, built around the BeagleBone Black Board that allows high frequency sampling directly from the power backplane and scalable integration with the internal node telemetry and system level power management software; (ii) a custom-built chassis, based on OpenRack form factor, and liquid cooling that allows the system to be used in modern, energy efficient, datacenter; (iii) software components designed for enabling fine grain power monitoring, power management (i.e. power capping and energy aware job scheduling) and application power profiling, based on dedicated machine learning components. Software APIs are offered to developers and users to tune the computing node performance and power consumption around on the application requirements. The first pilot system that we will deploy at the beginning of 2017, will demonstrate key HPC applications from different fields ported and optimized for this innovative platform.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
Authors:
Wael Elwasif,
William Godoy,
Nick Hagerty,
J. Austin Harris,
Oscar Hernandez,
Balint Joo,
Paul Kent,
Damien Lebrun-Grandie,
Elijah Maccarthy,
Veronica G. Melesse Vergara,
Bronson Messer,
Ross Miller,
Sarp Opal,
Sergei Bastrakov,
Michael Bussmann,
Alexander Debus,
Klaus Steinger,
Jan Stephan,
Rene Widera,
Spencer H. Bryngelson,
Henry Le Berre,
Anand Radhakrishnan,
Jefferey Young,
Sunita Chandrasekaran,
Florina Ciorba
, et al. (6 additional authors not shown)
Abstract:
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The syst…
▽ More
This paper assesses and reports the experience of ten teams working to port,validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems built by GIGABYTE, each one equipped with a server-class Arm CPU from Ampere Computing and A100 data center GPU from NVIDIA Corp. The systems are connected together using Infiniband high-bandwidth low-latency interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.
△ Less
Submitted 19 December, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.