-
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
Authors:
Mohammad Atif,
Meghna Battacharya,
Paolo Calafiura,
Taylor Childers,
Mark Dewing,
Zhihua Dong,
Oliver Gutsche,
Salman Habib,
Kyle Knoepfel,
Matti Kortelainen,
Ka Hei Martin Kwok,
Charles Leggett,
Meifeng Lin,
Vincent Pascuzzi,
Alexei Strelchenko,
Vakhtang Tsulaia,
Brett Viren,
Tianle Wang,
Beomki Yeo,
Haiwang Yu
Abstract:
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with t…
▽ More
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Safety and Completeness in Flow Decompositions for RNA Assembly
Authors:
Shahbaz Khan,
Milla Kortelainen,
Manuel Cáceres,
Lucia Williams,
Alexandru I. Tomescu
Abstract:
Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to…
▽ More
Decomposing a network flow into weighted paths has numerous applications. Some applications require any decomposition that is optimal w.r.t. some property such as number of paths, robustness, or length. Many bioinformatic applications require a specific decomposition where the paths correspond to some underlying data that generated the flow. For real inputs, no optimization criteria guarantees to uniquely identify the correct decomposition. Therefore, we propose to report safe paths, i.e., subpaths of at least one path in every flow decomposition.
Ma, Zheng, and Kingsford [WABI 2020] addressed the existence of multiple optimal solutions in a probabilistic framework, i.e., non-identifiability. Later [RECOMB 2021], they gave a quadratic-time algorithm based on a global criterion for solving a problem called AND-Quant, which generalizes the problem of reporting whether a given path is safe.
We give the first local characterization of safe paths for flow decompositions in directed acyclic graphs (DAGs), leading to a practical algorithm for finding the complete set of safe paths. We evaluated our algorithms against the trivial safe algorithms (unitigs, extended unitigs) and the popularly used heuristic (greedy-width) for flow decomposition on RNA transcripts datasets. Despite maintaining perfect precision our algorithm reports significantly higher coverage ($\approx 50\%$ more) than trivial safe algorithms. The greedy-width algorithm though reporting a better coverage, has significantly lower precision on complex graphs. Overall, our algorithm outperforms (by $\approx 20\%$) greedy-width on a unified metric (F-Score) when the dataset has significant number of complex graphs. Moreover, it has superior time ($3-5\times$) and space efficiency ($1.2-2.2\times$), resulting in a better and more practical approach for bioinformatics applications of flow decomposition.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs
Authors:
Giuseppe Cerati,
Peter Elmer,
Brian Gravelle,
Matti Kortelainen,
Vyacheslav Krutelyov,
Steven Lantz,
Mario Masciovecchio,
Kevin McDermott,
Boyana Norris,
Allison Reinsvold Hall,
Micheal Reid,
Daniel Riley,
Matevž Tadel,
Peter Wittich,
Bei Wang,
Frank Würthwein,
Avraham Yagil
Abstract:
We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized imple…
▽ More
We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized implementation of the combinatoric Kalman filter algorithm has enabled efficient global reconstruction of the entire event on modern computer architectures. We demonstrate the performance of the new implementation on Intel Xeon and NVIDIA GPU architectures.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Advancing Nuclear Physics Through TOPS Solvers and Tools
Authors:
E Ng,
J Sarich,
S M Wild,
T Munson,
H Aktulga,
C Yang,
P Maris,
J P Vary,
N Schunck,
M G Bertolli,
M Kortelainen,
W Nazarewicz,
T Papenbrock,
M V Stoitsov
Abstract:
At the heart of many scientific applications is the solution of algebraic systems, such as linear systems of equations, eigenvalue problems, and optimization problems, to name a few. TOPS, which stands for Towards Optimal Petascale Simulations, is a SciDAC applied math center focused on the development of solvers for tackling these algebraic systems, as well as the deployment of such technologies…
▽ More
At the heart of many scientific applications is the solution of algebraic systems, such as linear systems of equations, eigenvalue problems, and optimization problems, to name a few. TOPS, which stands for Towards Optimal Petascale Simulations, is a SciDAC applied math center focused on the development of solvers for tackling these algebraic systems, as well as the deployment of such technologies in large-scale scientific applications of interest to the U.S. Department of Energy. In this paper, we highlight some of the solver technologies we have developed in optimization and matrix computations. We also describe some accomplishments achieved using these technologies in UNEDF, a SciDAC application project on nuclear physics.
△ Less
Submitted 8 October, 2011;
originally announced October 2011.