Adaptively Optimizing the Performance of HPX's Parallel Algorithms
Authors:
Karame Mohammadiporshokooh,
Steven R. Brandt,
R. Tohid,
Hartmut Kaiser
Abstract:
C++ Executors simplify the development of parallel algorithms by abstracting concurrency management across hardware architectures. They are designed to facilitate portability and uniformity of user-facing interfaces; however, in some cases they may lead to performance inefficiencies duo to suboptimal resource allocation for a particular workload or not leveraging certain hardware-specific capabili…
▽ More
C++ Executors simplify the development of parallel algorithms by abstracting concurrency management across hardware architectures. They are designed to facilitate portability and uniformity of user-facing interfaces; however, in some cases they may lead to performance inefficiencies duo to suboptimal resource allocation for a particular workload or not leveraging certain hardware-specific capabilities. To mitigate these inefficiencies we have developed a strategy, based on cores and chunking (workload), and integrated it into HPX's executor API. This strategy dynamically optimizes for workload distribution and resource allocation based on runtime metrics and overheads. In this paper, we introduce the model behind this strategy and evaluate its efficiency by testing its implementation (as an HPX executor) on both compute-bound and memory-bound workloads. The results show speedups across all tests, configurations, and workloads studied. offering improved performance through a familiar and user-friendly C++ executor API.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
Quantifying Overheads in Charm++ and HPX using Task Bench
Authors:
Nanmiao Wu,
Ioannis Gonidelis,
Simeng Liu,
Zane Fink,
Nikunj Gupta,
Karame Mohammadiporshokooh,
Patrick Diehl,
Hartmut Kaiser,
Laxmikant V. Kale
Abstract:
Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX with the main stream MPI, OpenMP, and MPI+OpenMP libraries using the Task Bench benchmarks. Charm++ is a parallel programming language based on C++, supporting st…
▽ More
Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX with the main stream MPI, OpenMP, and MPI+OpenMP libraries using the Task Bench benchmarks. Charm++ is a parallel programming language based on C++, supporting stackless tasks as well as light-weight threads asynchronously along with an adaptive runtime system. HPX is a C++ library for concurrency and parallelism, exposing C++ standards conforming API. First, we analyze the commonalities, differences, and advantageous scenarios of Charm++ and HPX in detail. Further, to investigate the potential overheads introduced by the tasking systems of Charm++ and HPX, we utilize an existing parameterized benchmark, Task Bench, wherein 15 different programming systems were implemented, e.g., MPI, OpenMP, MPI + OpenMP, and extend Task Bench by adding HPX implementations. We quantify the overheads of Charm++, HPX, and the main stream libraries in different scenarios where a single task and multi-task are assigned to each core, respectively. We also investigate each system's scalability and the ability to hide the communication latency.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.