-
Deep RC: A Scalable Data Engineering and Deep Learning Pipeline
Authors:
Arup Kumar Sarker,
Aymen Alsaadi,
Alexander James Halpern,
Prabhath Tangella,
Mikhail Titov,
Niranda Perera,
Mills Staylor,
Gregor von Laszewski,
Shantenu Jha,
Geoffrey Fox
Abstract:
Significant obstacles exist in scientific domains including genetics, climate modeling, and astronomy due to the management, preprocess, and training on complicated data for deep learning. Even while several large-scale solutions offer distributed execution environments, open-source alternatives that integrate scalable runtime tools, deep learning and data frameworks on high-performance computing…
▽ More
Significant obstacles exist in scientific domains including genetics, climate modeling, and astronomy due to the management, preprocess, and training on complicated data for deep learning. Even while several large-scale solutions offer distributed execution environments, open-source alternatives that integrate scalable runtime tools, deep learning and data frameworks on high-performance computing platforms remain crucial for accessibility and flexibility. In this paper, we introduce Deep Radical-Cylon(RC), a heterogeneous runtime system that combines data engineering, deep learning frameworks, and workflow engines across several HPC environments, including cloud and supercomputing infrastructures. Deep RC supports heterogeneous systems with accelerators, allows the usage of communication libraries like MPI, GLOO and NCCL across multi-node setups, and facilitates parallel and distributed deep learning pipelines by utilizing Radical Pilot as a task execution framework. By attaining an end-to-end pipeline including preprocessing, model training, and postprocessing with 11 neural forecasting models (PyTorch) and hydrology models (TensorFlow) under identical resource conditions, the system reduces 3.28 and 75.9 seconds, respectively. The design of Deep RC guarantees the smooth integration of scalable data frameworks, such as Cylon, with deep learning processes, exhibiting strong performance on cloud platforms and scientific HPC systems. By offering a flexible, high-performance solution for resource-intensive applications, this method closes the gap between data preprocessing, model training, and postprocessing.
△ Less
Submitted 22 April, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Exascale Workflow Applications and Middleware: An ExaWorks Retrospective
Authors:
Aymen Alsaadi,
Mihael Hategan-Marandiuc,
Ketan Maheshwari,
Andre Merzky,
Mikhail Titov,
Matteo Turilli,
Andreas Wilke,
Justin M. Wozniak,
Kyle Chard,
Rafael Ferreira da Silva,
Shantenu Jha,
Daniel Laney
Abstract:
Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. We pre…
▽ More
Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a minimal interface for submitting and monitoring jobs and their execution state across multiple and commonly used HPC schedulers. We also describe several leading and innovative workflow examples of ExaWorks tools used on DOE leadership platforms. Furthermore, we discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of workflows sustainably at the exascale.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies
Authors:
Matteo Turilli,
Mihael Hategan-Marandiuc,
Mikhail Titov,
Ketan Maheshwari,
Aymen Alsaadi,
Andre Merzky,
Ramon Arambula,
Mikhail Zakharchanka,
Matt Cowan,
Justin M. Wozniak,
Andreas Wilke,
Ozgur Ozan Kilic,
Kyle Chard,
Rafael Ferreira da Silva,
Shantenu Jha,
Daniel Laney
Abstract:
Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables users to code their workflows and automate resour…
▽ More
Scientific discovery increasingly requires executing heterogeneous scientific workflows on high-performance computing (HPC) platforms. Heterogeneous workflows contain different types of tasks (e.g., simulation, analysis, and learning) that need to be mapped, scheduled, and launched on different computing. That requires a software stack that enables users to code their workflows and automate resource management and workflow execution. Currently, there are many workflow technologies with diverse levels of robustness and capabilities, and users face difficult choices of software that can effectively and efficiently support their use cases on HPC machines, especially when considering the latest exascale platforms. We contributed to addressing this issue by developing the ExaWorks Software Development Kit (SDK). The SDK is a curated collection of workflow technologies engineered following current best practices and specifically designed to work on HPC platforms. We present our experience with (1) curating those technologies, (2) integrating them to provide users with new capabilities, (3) developing a continuous integration platform to test the SDK on DOE HPC platforms, (4) designing a dashboard to publish the results of those tests, and (5) devising an innovative documentation platform to help users to use those technologies. Our experience details the requirements and the best practices needed to curate workflow technologies, and it also serves as a blueprint for the capabilities and services that DOE will have to offer to support a variety of scientific heterogeneous workflows on the newly available exascale HPC platforms.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Hydra: Brokering Cloud and HPC Resources to Support the Execution of Heterogeneous Workloads at Scale
Authors:
Aymen Alsaadi,
Shantenu Jha,
Matteo Turilli
Abstract:
Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms One of the main challenges is to design software components that integrate within the existing ecosystem to enable scale and performance across cloud and high-performance computing HPC platforms Researchers are met with a varied computing landscape which includes…
▽ More
Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms One of the main challenges is to design software components that integrate within the existing ecosystem to enable scale and performance across cloud and high-performance computing HPC platforms Researchers are met with a varied computing landscape which includes services available on commercial cloud platforms data and network capabilities specifically designed for scientific discovery on government-sponsored cloud platforms and scale and performance on HPC platforms We present Hydra an intra cross-cloud HPC brokering system capable of concurrently acquiring resources from commercial private cloud and HPC platforms and managing the execution of heterogeneous workflow applications on those resources This paper offers four main contributions (1) the design of brokering capabilities in the presence of task platform resource and middleware heterogeneity; (2) a reference implementation of that design with Hydra; (3) an experimental characterization of Hydra s overheads and strong weak scaling with heterogeneous workloads and platforms and, (4) the implementation of a workflow that models sea rise with Hydra and its scaling on cloud and HPC platforms
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Design and Implementation of an Analysis Pipeline for Heterogeneous Data
Authors:
Arup Kumar Sarker,
Aymen Alsaadi,
Niranda Perera,
Mills Staylor,
Gregor von Laszewski,
Matteo Turilli,
Ozgur Ozan Kilic,
Mikhail Titov,
Andre Merzky,
Shantenu Jha,
Geoffrey Fox
Abstract:
Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. In…
▽ More
Managing and preparing complex data for deep learning, a prevalent approach in large-scale data science can be challenging. Data transfer for model training also presents difficulties, impacting scientific fields like genomics, climate modeling, and astronomy. A large-scale solution like Google Pathways with a distributed execution environment for deep learning models exists but is proprietary. Integrating existing open-source, scalable runtime tools and data frameworks on high-performance computing (HPC) platforms is crucial to address these challenges. Our objective is to establish a smooth and unified method of combining data engineering and deep learning frameworks with diverse execution capabilities that can be deployed on various high-performance computing platforms, including cloud and supercomputers. We aim to support heterogeneous systems with accelerators, where Cylon and other data engineering and deep learning frameworks can utilize heterogeneous execution. To achieve this, we propose Radical-Cylon, a heterogeneous runtime system with a parallel and distributed data framework to execute Cylon as a task of Radical Pilot. We thoroughly explain Radical-Cylon's design and development and the execution process of Cylon tasks using Radical Pilot. This approach enables the use of heterogeneous MPI-communicators across multiple nodes. Radical-Cylon achieves better performance than Bare-Metal Cylon with minimal and constant overhead. Radical-Cylon achieves (4~15)% faster execution time than batch execution while performing similar join and sort operations with 35 million and 3.5 billion rows with the same resources. The approach aims to excel in both scientific and engineering research HPC systems while demonstrating robust performance on cloud infrastructures. This dual capability fosters collaboration and innovation within the open-source scientific research community.
△ Less
Submitted 7 April, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function
Authors:
Ali Hakem Alsaeedi,
Suha Mohammed Hadi,
Yarub Alazzawi
Abstract:
Images captured in dusty environments suffering from poor visibility and quality. Enhancement of these images such as sand dust images plays a critical role in various atmospheric optics applications. In this work, proposed a new model based on Color Correction and new membership function to enhance san dust images. The proposed model consists of three phases: correction of color shift, removal of…
▽ More
Images captured in dusty environments suffering from poor visibility and quality. Enhancement of these images such as sand dust images plays a critical role in various atmospheric optics applications. In this work, proposed a new model based on Color Correction and new membership function to enhance san dust images. The proposed model consists of three phases: correction of color shift, removal of haze, and enhancement of contrast and brightness. The color shift is corrected using a new membership function to adjust the values of U and V in the YUV color space. The Adaptive Dark Channel Prior (A-DCP) is used for haze removal. The stretching contrast and improving image brightness are based on Contrast Limited Adaptive Histogram Equalization (CLAHE). The proposed model tests and evaluates through many real sand dust images. The experimental results show that the proposed solution is outperformed the current studies in terms of effectively removing the red and yellow cast and provides high quality and quantity dust images.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms
Authors:
Aymen Alsaadi,
Logan Ward,
Andre Merzky,
Kyle Chard,
Ian Foster,
Shantenu Jha,
Matteo Turilli
Abstract:
Workflows applications are becoming increasingly important to support scientific discovery. That is leading to a proliferation of workflow management systems and, thus, to a fragmented software ecosystem. Integration among existing workflow tools can improve development efficiency and, ultimately, increase the sustainability of scientific workflow software. We describe our experience with integrat…
▽ More
Workflows applications are becoming increasingly important to support scientific discovery. That is leading to a proliferation of workflow management systems and, thus, to a fragmented software ecosystem. Integration among existing workflow tools can improve development efficiency and, ultimately, increase the sustainability of scientific workflow software. We describe our experience with integrating RADICAL-Pilot (RP) and Parsl as a way to enable users to develop and execute workflow applications with heterogeneous tasks on heterogeneous high-performance computing resources. We describe our approach to the integration of the two systems and detail the development of RPEX, a Parsl executor which uses RP as its workload manager. We develop an RP executor that executes heterogeneous MPI Python functions on CPU cores and GPUs. We measure the weak and strong scaling of RPEX, RP, and Parsl when providing new capabilities to two paradigmatic use cases: Colmena and Ice Wedge Polygons
△ Less
Submitted 30 August, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Extended Particle Swarm Optimization (EPSO) for Feature Selection of High Dimensional Biomedical Data
Authors:
Ali Hakem Alsaeedi,
Adil L. Albukhnefis,
Dhiah Al-Shammary,
Muntasir Al-Asfoor
Abstract:
This paper proposes a novel Extended Particle Swarm Optimization model (EPSO) that potentially enhances the search process of PSO for optimization problem. Evidently, gene expression profiles are significantly important measurement factor in molecular biology that is used in medical diagnosis of cancer types. The challenge to certain classification methodologies for gene expression profiles lies i…
▽ More
This paper proposes a novel Extended Particle Swarm Optimization model (EPSO) that potentially enhances the search process of PSO for optimization problem. Evidently, gene expression profiles are significantly important measurement factor in molecular biology that is used in medical diagnosis of cancer types. The challenge to certain classification methodologies for gene expression profiles lies in the thousands of features recorded for each sample. A modified Wrapper feature selection model is applied with the aim of addressing the gene classification challenge by replacing its randomness approach with EPSO and PSO respectively. EPSO is initializing the random size of the population and dividing them into two groups in order to promote the exploration and reduce the probability of falling in stagnation. Experimentally, EPSO has required less processing time to select the optimal features (average of 62.14 sec) than PSO (average of 95.72 sec). Furthermore, EPSO accuracy has provided better classification results (start from 54% to 100%) than PSO (start from 52% to 96%).
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
Palm Vein Identification based on hybrid features selection model
Authors:
Mohammed Hamzah Abed,
Ali H. Alsaeedi,
Ali D. Alfoudi,
Abayomi M. Otebolaku,
Yasmeen Sajid Razooqi
Abstract:
Palm vein identification (PVI) is a modern biometric security technique used for increasing security and authentication systems. The key characteristics of palm vein patterns include, its uniqueness to each individual, unforgettable, non-intrusive and cannot be taken by an unauthorized person. However, the extracted features from the palm vein pattern are huge with high redundancy. In this paper,…
▽ More
Palm vein identification (PVI) is a modern biometric security technique used for increasing security and authentication systems. The key characteristics of palm vein patterns include, its uniqueness to each individual, unforgettable, non-intrusive and cannot be taken by an unauthorized person. However, the extracted features from the palm vein pattern are huge with high redundancy. In this paper, we propose a combine model of two-Dimensional Discrete Wavelet Transform, Principal Component Analysis (PCA), and Particle Swarm Optimization (PSO) (2D-DWTPP) to enhance prediction of vein palm patterns. The 2D-DWT Extracts features from palm vein images, PCA reduces the redundancy in palm vein features. The system has been trained in selecting high reverent features based on the wrapper model. The PSO feeds wrapper model by an optimal subset of features. The proposed system uses four classifiers as an objective function to determine VPI which include Support Vector Machine (SVM), K Nearest Neighbor (KNN), Decision Tree (DT) and Naïve Bayes (NB). The empirical result shows the proposed system Iit satisfied best results with SVM. The proposed 2D-DWTPP model has been evaluated and the results shown remarkable efficiency in comparison with Alexnet and classifier without feature selection. Experimentally, our model has better accuracy reflected by (98.65) while Alexnet has (63.5) and applied classifier without feature selection has (78.79).
△ Less
Submitted 31 July, 2020;
originally announced July 2020.