-
Packaging HEP Heterogeneous Mini-apps for Portable Benchmarking and Facility Evaluation on Modern HPCs
Authors:
Mohammad Atif,
Pengfei Ding,
Ka Hei Martin Kwok,
Charles Leggett
Abstract:
High Energy Physics (HEP) experiments are making increasing use of GPUs and GPU dominated High Performance Computer facilities. Both the software and hardware of these systems are rapidly evolving, creating challenges for experiments to make informed decisions as to where they wish to devote resources. In its first phase, the High Energy Physics Center for Computational Excellence (HEP-CCE) produc…
▽ More
High Energy Physics (HEP) experiments are making increasing use of GPUs and GPU dominated High Performance Computer facilities. Both the software and hardware of these systems are rapidly evolving, creating challenges for experiments to make informed decisions as to where they wish to devote resources. In its first phase, the High Energy Physics Center for Computational Excellence (HEP-CCE) produced portable versions of a number of heterogeneous HEP mini-apps, such as \ptor, FastCaloSim, Patatrack and the WireCell Toolkit, that exercise a broad range of GPU characteristics, enabling cross platform and facility benchmarking and evaluation. However, these mini-apps still require a significant amount of manual intervention to deploy on a new facility.
We present our work in developing turn-key deployments of these mini-apps, where by means of containerization and automated configuration and build techniques such as Spack, we are able to quickly test new hardware, software, environments and entire facilities with minimal user intervention, and then track performance metrics over time.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading
Authors:
Mohammad Atif,
Tianle Wang,
Zhihua Dong,
Charles Leggett,
Meifeng Lin
Abstract:
We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architectures are currently undergoing rapid development. These developments influence performance of various complex applications in different ways. This framework can be employed to track the impact of compi…
▽ More
We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architectures are currently undergoing rapid development. These developments influence performance of various complex applications in different ways. This framework can be employed to track the impact of compiler upgrades and compare their performance with the native programming models. We use the framework to benchmark performance of a few commonly used operations on leadership class supercomputers such as Perlmutter at National Energy Research Scientific Computing (NERSC) Center and Frontier at Oak Ridge Leadership Computing Facility (OLCF). Such a framework will be useful for compiler developers to gain insights into the overall impact of many small changes, as well as for users to decide which compilers and versions are expected to yield best performance for their applications.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Fourier neural operators for spatiotemporal dynamics in two-dimensional turbulence
Authors:
Mohammad Atif,
Pulkit Dubey,
Pratik P. Aghor,
Vanessa Lopez-Marrero,
Tao Zhang,
Abdullah Sharfuddin,
Kwangmin Yu,
Fan Yang,
Foluso Ladeinde,
Yangang Liu,
Meifeng Lin,
Lingda Li
Abstract:
High-fidelity direct numerical simulation of turbulent flows for most real-world applications remains an outstanding computational challenge. Several machine learning approaches have recently been proposed to alleviate the computational cost even though they become unstable or unphysical for long time predictions. We identify that the Fourier neural operator (FNO) based models combined with a part…
▽ More
High-fidelity direct numerical simulation of turbulent flows for most real-world applications remains an outstanding computational challenge. Several machine learning approaches have recently been proposed to alleviate the computational cost even though they become unstable or unphysical for long time predictions. We identify that the Fourier neural operator (FNO) based models combined with a partial differential equation (PDE) solver can accelerate fluid dynamic simulations and thus address computational expense of large-scale turbulence simulations. We treat the FNO model on the same footing as a PDE solver and answer important questions about the volume and temporal resolution of data required to build pre-trained models for turbulence. We also discuss the pitfalls of purely data-driven approaches that need to be avoided by the machine learning models to become viable and competitive tools for long time simulations of turbulence.
△ Less
Submitted 25 September, 2024; v1 submitted 22 September, 2024;
originally announced September 2024.
-
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
Authors:
Mohammad Atif,
Meghna Battacharya,
Paolo Calafiura,
Taylor Childers,
Mark Dewing,
Zhihua Dong,
Oliver Gutsche,
Salman Habib,
Kyle Knoepfel,
Matti Kortelainen,
Ka Hei Martin Kwok,
Charles Leggett,
Meifeng Lin,
Vincent Pascuzzi,
Alexei Strelchenko,
Vakhtang Tsulaia,
Brett Viren,
Tianle Wang,
Beomki Yeo,
Haiwang Yu
Abstract:
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with t…
▽ More
High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL
Authors:
Meifeng Lin,
Zhihua Dong,
Tianle Wang,
Mohammad Atif,
Meghna Battacharya,
Kyle Knoepfel,
Charles Leggett,
Brett Viren,
Haiwang Yu
Abstract:
The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different com…
▽ More
The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different computing systems, such as Kokkos, SYCL, OpenMP and others. As part of the High Energy Physics Center for Computational Excellence (HEP-CCE) project, we investigate if and how these different programming models may be suitable for experimental HEP workflows through a few representative use cases. One of such use cases is the Liquid Argon Time Projection Chamber (LArTPC) simulation which is essential for LArTPC detector design, validation and data analysis. Following up on our previous investigations of using Kokkos to port LArTPC simulation in the Wire-Cell Toolkit (WCT) to GPUs, we have explored OpenMP and SYCL as potential portable programming models for WCT, with the goal to make diverse computing resources accessible to the LArTPC simulations. In this work, we describe how we utilize relevant features of OpenMP and SYCL for the LArTPC simulation module in WCT. We also show performance benchmark results on multi-core CPUs, NVIDIA and AMD GPUs for both the OpenMP and the SYCL implementations. Comparisons with different compilers will also be given where appropriate.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
A Vision-Based Algorithm for a Path Following Problem
Authors:
Mario Terlizzi,
Giuseppe Silano,
Luigi Russo,
Muhammad Aatif,
Amin Basiri,
Valerio Mariani,
Luigi Iannelli,
Luigi Glielmo
Abstract:
A novel prize-winner algorithm designed for a path following problem within the Unmanned Aerial Vehicle (UAV) field is presented in this paper. The proposed approach exploits the advantages offered by the pure pursuing algorithm to set up an intuitive and simple control framework. A path fora quad-rotor UAV is obtained by using downward facing camera images implementing an Image-Based Visual Servo…
▽ More
A novel prize-winner algorithm designed for a path following problem within the Unmanned Aerial Vehicle (UAV) field is presented in this paper. The proposed approach exploits the advantages offered by the pure pursuing algorithm to set up an intuitive and simple control framework. A path fora quad-rotor UAV is obtained by using downward facing camera images implementing an Image-Based Visual Servoing (IBVS) approach. Numerical simulations in MATLAB together with the MathWorks Virtual Reality (VR) toolbox demonstrate the validity and the effectiveness of the proposed solution. The code is released as open-source making it possible to go through any part of the system and to replicate the obtained results.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Structured Nonnegative Matrix Factorization for Traffic Flow Estimation of Large Cloud Networks
Authors:
Syed Muhammad Atif,
Nicolas Gillis,
Sameer Qazi,
Imran Naseem
Abstract:
Network traffic matrix estimation is an ill-posed linear inverse problem: it requires to estimate the unobservable origin destination traffic flows, X, given the observable link traffic flows, Y, and a binary routing matrix, A, which are such that Y = AX. This is a challenging but vital problem as accurate estimation of OD flows is required for several network management tasks. In this paper, we p…
▽ More
Network traffic matrix estimation is an ill-posed linear inverse problem: it requires to estimate the unobservable origin destination traffic flows, X, given the observable link traffic flows, Y, and a binary routing matrix, A, which are such that Y = AX. This is a challenging but vital problem as accurate estimation of OD flows is required for several network management tasks. In this paper, we propose a novel model for the network traffic matrix estimation problem which maps high-dimension OD flows to low-dimension latent flows with the following three constraints: (1) nonnegativity constraint on the estimated OD flows, (2) autoregression constraint that enables the proposed model to effectively capture temporal patterns of the OD flows, and (3) orthogonality constraint that ensures the mapping between low-dimensional latent flows and the corresponding link flows to be distance preserving. The parameters of the proposed model are estimated with a training algorithm based on Nesterov accelerated gradient and generally shows fast convergence. We validate the proposed traffic flow estimation model on two real backbone IP network datasets, namely Internet2 and G'EANT. Empirical results show that the proposed model outperforms the state-of-the-art models not only in terms of tracking the individual OD flows but also in terms of standard performance metrics. The proposed model is also found to be highly scalable compared to the existing state-of-the-art approaches.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Multi-Kernel Fusion for RBF Neural Networks
Authors:
Syed Muhammad Atif,
Shujaat Khan,
Imran Naseem,
Roberto Togneri,
Mohammed Bennamoun
Abstract:
A simple yet effective architectural design of radial basis function neural networks (RBFNN) makes them amongst the most popular conventional neural networks. The current generation of radial basis function neural network is equipped with multiple kernels which provide significant performance benefits compared to the previous generation using only a single kernel. In existing multi-kernel RBF algo…
▽ More
A simple yet effective architectural design of radial basis function neural networks (RBFNN) makes them amongst the most popular conventional neural networks. The current generation of radial basis function neural network is equipped with multiple kernels which provide significant performance benefits compared to the previous generation using only a single kernel. In existing multi-kernel RBF algorithms, multi-kernel is formed by the convex combination of the base/primary kernels. In this paper, we propose a novel multi-kernel RBFNN in which every base kernel has its own (local) weight. This novel flexibility in the network provides better performance such as faster convergence rate, better local minima and resilience against stucking in poor local minima. These performance gains are achieved at a competitive computational complexity compared to the contemporary multi-kernel RBF algorithms. The proposed algorithm is thoroughly analysed for performance gain using mathematical and graphical illustrations and also evaluated on three different types of problems namely: (i) pattern classification, (ii) system identification and (iii) function approximation. Empirical results clearly show the superiority of the proposed algorithm compared to the existing state-of-the-art multi-kernel approaches.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
A Novel Compressed Sensing Technique for Traffic Matrix Estimation of Software Defined Cloud Networks
Authors:
Sameer Qazi,
Syed Muhammad Atif,
Muhammad Bilal Kadri
Abstract:
Traffic Matrix estimation has always caught attention from researchers for better network management and future planning. With the advent of high traffic loads due to Cloud Computing platforms and Software Defined Networking based tunable routing and traffic management algorithms on the Internet, it is more necessary as ever to be able to predict current and future traffic volumes on the network.…
▽ More
Traffic Matrix estimation has always caught attention from researchers for better network management and future planning. With the advent of high traffic loads due to Cloud Computing platforms and Software Defined Networking based tunable routing and traffic management algorithms on the Internet, it is more necessary as ever to be able to predict current and future traffic volumes on the network. For large networks such origin-destination traffic prediction problem takes the form of a large under-constrained and under-determined system of equations with a dynamic measurement matrix. In this work, we present our Compressed Sensing with Dynamic Model Estimation (CS-DME) architecture suitable for modern software defined networks. Our main contributions are: (1) we formulate an approach in which measurement matrix in the compressed sensing scheme can be accurately and dynamically estimated through a reformulation of the problem based on traffic demands. (2) We show that the problem formulation using a dynamic measurement matrix based on instantaneous traffic demands may be used instead of a stationary binary routing matrix which is more suitable to modern Software Defined Networks that are constantly evolving in terms of routing by inspection of its Eigen Spectrum using two real world datasets. (3) We also show that linking this compressed measurement matrix dynamically with the measured parameters can lead to acceptable estimation of Origin Destination (OD) Traffic flows with marginally poor results with other state-of-art schemes relying on fixed measurement matrices. (4) Furthermore, using this compressed reformulated problem, a new strategy for selection of vantage points for most efficient traffic matrix estimation is also presented through a secondary compression technique based on subset of link measurements.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Soft Computing Techniques for Dependable Cyber-Physical Systems
Authors:
Muhammad Atif,
Siddique Latif,
Rizwan Ahmad,
Adnan Khalid Kiani,
Junaid Qadir,
Adeel Baig,
Hisao Ishibuchi,
Waseem Abbas
Abstract:
Cyber-Physical Systems (CPS) allow us to manipulate objects in the physical world by providing a communication bridge between computation and actuation elements. In the current scheme of things, this sought-after control is marred by limitations inherent in the underlying communication network(s) as well as by the uncertainty found in the physical world. These limitations hamper fine-grained contr…
▽ More
Cyber-Physical Systems (CPS) allow us to manipulate objects in the physical world by providing a communication bridge between computation and actuation elements. In the current scheme of things, this sought-after control is marred by limitations inherent in the underlying communication network(s) as well as by the uncertainty found in the physical world. These limitations hamper fine-grained control of elements that may be separated by large-scale distances. In this regard, soft computing is an emerging paradigm that can help to overcome the vulnerabilities, and unreliability of CPS by using techniques including fuzzy systems, neural network, evolutionary computation, probabilistic reasoning and rough sets. In this paper, we present a comprehensive contemporary review of soft computing techniques for CPS dependability modeling, analysis, and improvement. This paper provides an overview of CPS applications, explores the foundations of dependability engineering, and highlights the potential role of soft computing techniques for CPS dependability with various case studies, while identifying common pitfalls and future directions. In addition, this paper provides a comprehensive survey on the use of various soft computing techniques for making CPS dependable.
△ Less
Submitted 27 July, 2020; v1 submitted 25 January, 2018;
originally announced January 2018.