Search | arXiv e-print repository

doi 10.1109/SCW63240.2024.00240

Portability of Fortran's `do concurrent' on GPUs

Authors: Ronald M. Caplan, Miko M. Stulajter, Jon A. Linker, Jeff Larkin, Henry A. Gabb, Shiquan Su, Ivan Rodriguez, Zachary Tschirhart, Nicholas Malaya

Abstract: There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its co… ▽ More There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the {\tt do concurrent} (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown. △ Less

Submitted 23 December, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: 10 pages, 7 figures, To appear in the workshop proceedings of WACCPD24 at SC24

arXiv:2303.03398 [pdf, other]

Acceleration of a production Solar MHD code with Fortran standard parallelism: From OpenACC to `do concurrent'

Authors: Ronald M. Caplan, Miko M. Stulajter, Jon A. Linker

Abstract: There is growing interest in using standard language constructs for accelerated computing, avoiding the need for (often vendor-specific) external APIs. These constructs hold the potential to be more portable and much more `future-proof'. For Fortran codes, the current focus is on the {\tt do concurrent} (DC) loop. While there have been some successful examples of GPU-acceleration using DC for benc… ▽ More There is growing interest in using standard language constructs for accelerated computing, avoiding the need for (often vendor-specific) external APIs. These constructs hold the potential to be more portable and much more `future-proof'. For Fortran codes, the current focus is on the {\tt do concurrent} (DC) loop. While there have been some successful examples of GPU-acceleration using DC for benchmark and/or small codes, its widespread adoption will require demonstrations of its use in full-size applications. Here, we look at the current capabilities and performance of using DC in a production application called Magnetohydrodynamic Algorithm outside a Sphere (MAS). MAS is a state-of-the-art model for studying coronal and heliospheric dynamics, is over 70,000 lines long, and has previously been ported to GPUs using MPI+OpenACC. We attempt to eliminate as many of its OpenACC directives as possible in favor of DC. We show that using the NVIDIA {\tt nvfortran} compiler's Fortran 202X preview implementation, unified managed memory, and modified MPI launch methods, we can achieve GPU acceleration across multiple GPUs without using a single OpenACC directive. However, doing so results in a slowdown between 1.25x and 3x. We discuss what future improvements are needed to avoid this loss, and show how we can still retain close △ Less

Submitted 8 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

Comments: 10 pages, 2 tables, 4 figures, accepted to the AsHES workshop at IPDPS 2023

MSC Class: 65Y05

arXiv:2110.10151 [pdf, other]

Can Fortran's 'do concurrent' replace directives for accelerated computing?

Authors: Miko M. Stulajter, Ronald M. Caplan, Jon A. Linker

Abstract: Recently, there has been growing interest in using standard language constructs (e.g. C++'s Parallel Algorithms and Fortran's do concurrent) for accelerated computing as an alternative to directive-based APIs (e.g. OpenMP and OpenACC). These constructs have the potential to be more portable, and some compilers already (or have plans to) support such standards. Here, we look at the current capabili… ▽ More Recently, there has been growing interest in using standard language constructs (e.g. C++'s Parallel Algorithms and Fortran's do concurrent) for accelerated computing as an alternative to directive-based APIs (e.g. OpenMP and OpenACC). These constructs have the potential to be more portable, and some compilers already (or have plans to) support such standards. Here, we look at the current capabilities, portability, and performance of replacing directives with Fortran's do concurrent using a mini-app that currently implements OpenACC for GPU-acceleration and OpenMP for multi-core CPU parallelism. We replace as many directives as possible with do concurrent, testing various configurations and compiler options within three major compilers: GNU's gfortran, NVIDIA's nvfortran, and Intel's ifort. We find that with the right compiler versions and flags, many directives can be replaced without loss of performance or portability, and, in the case of nvfortran, they can all be replaced. We discuss limitations that may apply to more complicated codes and future language additions that may mitigate them. The software and Singularity containers are publicly provided to allow the results to be reproduced. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 18 pages, 2 figures, Accepted for publication at WACCPD 2021

MSC Class: 68U99 ACM Class: D.1.3; D.3.3

Showing 1–3 of 3 results for author: Stulajter, M M