-
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Authors:
Daniel Lersch,
Malachi Schram,
Zhenyu Dai,
Kishansingh Rajput,
Xingfu Wu,
N. Sato,
J. Taylor Childers
Abstract:
Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in…
▽ More
Large scale, inverse problem solving deep learning algorithms have become an essential part of modern research and industrial applications. The complexity of the underlying inverse problem often poses challenges to the algorithm and requires the proper utilization of high-performance computing systems. Most deep learning algorithms require, due to their design, custom parallelization techniques in order to be resource efficient while showing a reasonable convergence. In this paper we introduces a \underline{S}calable \underline{A}synchronous \underline{G}enerative workflow for solving \underline{I}nverse \underline{P}roblems \underline{S}olver (SAGIPS) on high-performance computing systems. We present a workflow that utilizes a parallelization approach where the gradients of the generator network are updated in an asynchronous ring-all-reduce fashion. Experiments with a scientific proxy application demonstrate that SAGIPS shows near linear weak scaling, together with a convergence quality that is comparable to traditional methods. The approach presented here allows leveraging GANs across multiple GPUs, promising advancements in solving complex inverse problems at scale.
△ Less
Submitted 11 June, 2024;
originally announced July 2024.
-
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
Authors:
Mohammad Atif,
Meghna Battacharya,
Paolo Calafiura,
Taylor Childers,
Mark Dewing,
Zhihua Dong,
Oliver Gutsche,
Salman Habib,
Kyle Knoepfel,
Matti Kortelainen,
Ka Hei Martin Kwok,
Charles Leggett,
Meifeng Lin,
Vincent Pascuzzi,
Alexei Strelchenko,
Vakhtang Tsulaia,
Brett Viren,
Tianle Wang,
Beomki Yeo,
Haiwang Yu
Abstract:
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with t…
▽ More
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Developments in Performance and Portability for MadGraph5_aMC@NLO
Authors:
Andrea Valassi,
Taylor Childers,
Laurence Field,
Stefan Hageböck,
Walter Hopkins,
Olivier Mattelaer,
Nathan Nichols,
Stefan Roiser,
David Smith
Abstract:
Event generators simulate particle interactions using Monte Carlo techniques, providing the primary connection between experiment and theory in experimental high energy physics. These software packages, which are the first step in the simulation worflow of collider experiments, represent approximately 5 to 20% of the annual WLCG usage for the ATLAS and CMS experiments. With computing architectures…
▽ More
Event generators simulate particle interactions using Monte Carlo techniques, providing the primary connection between experiment and theory in experimental high energy physics. These software packages, which are the first step in the simulation worflow of collider experiments, represent approximately 5 to 20% of the annual WLCG usage for the ATLAS and CMS experiments. With computing architectures becoming more heterogeneous, it is important to ensure that these key software frameworks can be run on future systems, large and small. In this contribution, recent progress on porting and speeding up the Madgraph5_aMC@NLO event generator on hybrid architectures, i.e. CPU with GPU accelerators, is discussed. The main focus of this work has been in the calculation of scattering amplitudes and "matrix elements", which is the computational bottleneck of an event generation application. For physics processes limited to QCD leading order, the code generation toolkit has been expanded to produce matrix element calculations using C++ vector instructions on CPUs and using CUDA for NVidia GPUs, as well as using Alpaka, Kokkos and SYCL for multiple CPU and GPU architectures. Performance is reported in terms of matrix element calculations per time on NVidia, Intel, and AMD devices. The status and outlook for the integration of this work into a production release usable by the LHC experiments, with the same functionalities and very similar user interfaces as the current Fortran version, is also described.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Toward Real-time Analysis of Experimental Science Workloads on Geographically Distributed Supercomputers
Authors:
Michael Salim,
Thomas Uram,
J. Taylor Childers,
Venkat Vishwanath,
Michael E. Papka
Abstract:
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing policies are prohibitive when faced with scarce inst…
▽ More
Massive upgrades to science infrastructure are driving data velocities upwards while stimulating adoption of increasingly data-intensive analytics. While next-generation exascale supercomputers promise strong support for I/O-intensive workflows, HPC remains largely untapped by live experiments, because data transfers and disparate batch-queueing policies are prohibitive when faced with scarce instrument time. To bridge this divide, we introduce Balsam: a distributed orchestration platform enabling workflows at the edge to securely and efficiently trigger analytics tasks across a user-managed federation of HPC execution sites. We describe the architecture of the Balsam service, which provides a workflow management API, and distributed sites that provision resources and schedule scalable, fault-tolerant execution. We demonstrate Balsam in efficiently scaling real-time analytics from two DOE light sources simultaneously onto three supercomputers (Theta, Summit, and Cori), while maintaining low overheads for on-demand computing, and providing a Python library for seamless integration with existing ecosystems of data analysis tools.
△ Less
Submitted 2 July, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Deep Reinforcement Agent for Scheduling in HPC
Authors:
Yuping Fan,
Zhiling Lan,
Taylor Childers,
Paul Rich,
William Allcock,
Michael E. Papka
Abstract:
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems and the highly dynamic nature of application worklo…
▽ More
Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems and the highly dynamic nature of application workloads have placed tremendous burden on manually designed and tuned scheduling heuristics. More aggressive optimization and automation are needed for cluster scheduling in HPC. In this work, we present an automated HPC scheduling agent named DRAS (Deep Reinforcement Agent for Scheduling) by leveraging deep reinforcement learning. DRAS is built on a novel, hierarchical neural network incorporating special HPC scheduling features such as resource reservation and backfilling. A unique training strategy is presented to enable DRAS to rapidly learn the target environment. Once being provided a specific scheduling objective given by system manager, DRAS automatically learns to improve its policy through interaction with the scheduling environment and dynamically adjusts its policy as workload changes. The experiments with different production workloads demonstrate that DRAS outperforms the existing heuristic and optimization approaches by up to 45%.
△ Less
Submitted 19 April, 2021; v1 submitted 11 February, 2021;
originally announced February 2021.
-
Balsam: Automated Scheduling and Execution of Dynamic, Data-Intensive HPC Workflows
Authors:
Michael A. Salim,
Thomas D. Uram,
J. Taylor Childers,
Prasanna Balaprakash,
Venkatram Vishwanath,
Michael E. Papka
Abstract:
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their…
▽ More
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic multi-task workflows. With abstractions for the local resource scheduler and MPI environment, Balsam dynamically packages tasks into ensemble jobs and manages their scheduling lifecycle. The ensembles execute in a pilot "launcher" which (i) ensures concurrent, load-balanced execution of arbitrary serial and parallel programs with heterogeneous processor requirements, (ii) requires no modification of user applications, (iii) is tolerant of task-level faults and provides several options for error recovery, (iv) stores provenance data (e.g task history, error logs) in the database, (v) supports dynamic workflows, in which tasks are created or killed at runtime. Here, we present the design and Python implementation of the Balsam service and launcher. The efficacy of this system is illustrated using two case studies: hyperparameter optimization of deep neural networks, and high-throughput single-point quantum chemistry calculations. We find that the unique combination of flexible job-packing and automated scheduling with dynamic (pilot-managed) execution facilitates excellent resource utilization. The scripting overheads typically needed to manage resources and launch workflows on supercomputers are substantially reduced, accelerating workflow development and execution.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Machine Learning in High Energy Physics Community White Paper
Authors:
Kim Albertsson,
Piero Altoe,
Dustin Anderson,
John Anderson,
Michael Andrews,
Juan Pedro Araque Espinosa,
Adam Aurisano,
Laurent Basara,
Adrian Bevan,
Wahid Bhimji,
Daniele Bonacorsi,
Bjorn Burkle,
Paolo Calafiura,
Mario Campanelli,
Louis Capps,
Federico Carminati,
Stefano Carrazza,
Yi-fan Chen,
Taylor Childers,
Yann Coadou,
Elias Coniavitis,
Kyle Cranmer,
Claire David,
Douglas Davis,
Andrea De Simone
, et al. (103 additional authors not shown)
Abstract:
Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d…
▽ More
Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We detail a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit.
△ Less
Submitted 16 May, 2019; v1 submitted 8 July, 2018;
originally announced July 2018.
-
Adapting the serial Alpgen event generator to simulate LHC collisions on millions of parallel threads
Authors:
J. T. Childers,
T. D. Uram,
T. J. LeCompte,
M. E. Papka,
D. P. Benjamin
Abstract:
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application th…
▽ More
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.