-
Digital Ecosystem for FAIR Time Series Data Management in Environmental System Science
Authors:
J. Bumberger,
M. Abbrent,
N. Brinckmann,
J. Hemmen,
R. Kunkel,
C. Lorenz,
P. Lünenschloß,
B. Palm,
T. Schnicke,
C. Schulz,
H. van der Schaaf,
D. Schäfer
Abstract:
Addressing the challenges posed by climate change, biodiversity loss, and environmental pollution requires comprehensive monitoring and effective data management strategies that are applicable across various scales in environmental system science. This paper introduces a versatile and transferable digital ecosystem for managing time series data, designed to adhere to the FAIR principles (Findable,…
▽ More
Addressing the challenges posed by climate change, biodiversity loss, and environmental pollution requires comprehensive monitoring and effective data management strategies that are applicable across various scales in environmental system science. This paper introduces a versatile and transferable digital ecosystem for managing time series data, designed to adhere to the FAIR principles (Findable, Accessible, Interoperable, and Reusable). The system is highly adaptable, cloud-ready, and suitable for deployment in a wide range of settings, from small-scale projects to large-scale monitoring initiatives. The ecosystem comprises three core components: the Sensor Management System (SMS) for detailed metadata registration and management; time$.$IO, a platform for efficient time series data storage, transfer, and real-time visualization; and the System for Automated Quality Control (SaQC), which ensures data integrity through real-time analysis and quality assurance. The modular architecture, combined with standardized protocols and interfaces, ensures that the ecosystem can be easily transferred and deployed across different environments and institutions. This approach enhances data accessibility for a broad spectrum of stakeholders, including researchers, policymakers, and the public, while fostering collaboration and advancing scientific research in environmental monitoring.
△ Less
Submitted 17 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Understanding GPU Triggering APIs for MPI+X Communication
Authors:
Patrick G. Bridges,
Anthony Skjellum,
Evan D. Suggs,
Derek Schafer,
Purushotham V. Bangalore
Abstract:
GPU-enhanced architectures are now dominant in HPC systems, but message-passing communication involving GPUs with MPI has proven to be both complex and expensive, motivating new approaches that lower such costs. We compare and contrast stream/graph- and kernel-triggered MPI communication abstractions, whose principal purpose is to enhance the performance of communication when GPU kernels create or…
▽ More
GPU-enhanced architectures are now dominant in HPC systems, but message-passing communication involving GPUs with MPI has proven to be both complex and expensive, motivating new approaches that lower such costs. We compare and contrast stream/graph- and kernel-triggered MPI communication abstractions, whose principal purpose is to enhance the performance of communication when GPU kernels create or consume data for transfer through MPI operations. Researchers and practitioners have proposed multiple potential APIs for stream and/or kernel triggering that span various GPU architectures and approaches, including MPI-4 partitioned point-to-point communication, stream communicators, and explicit MPI stream/queue objects. Designs breaking backward compatibility with MPI are duly noted. Some of these strengthen or weaken the semantics of MPI operations. A key contribution of this paper is to promote community convergence toward a stream- and/or kernel-triggering abstraction by highlighting the common and differing goals and contributions of existing abstractions. We describe the design space in which these abstractions reside, their implicit or explicit use of stream and other non-MPI abstractions, their relationship to partitioned and persistent operations, and discuss their potential for added performance, how usable these abstractions are, and where functional and/or semantic gaps exist. Finally, we provide a taxonomy for stream- and kernel-triggered abstractions, including disambiguation of similar semantic terms, and consider directions for future standardization in MPI-5.
△ Less
Submitted 31 July, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
MPI Implementation Profiling for Better Application Performance
Authors:
Riley Shipley,
Garrett Hooten,
David Boehme,
Derek Schafer,
Anthony Skjellum,
Olga Pearce
Abstract:
While application profiling has been a mainstay in the HPC community for years, profiling of MPI and other communication middleware has not received the same degree of exploration. This paper adds to the discussion of MPI profiling, contributing two general-purpose profiling methods as well as practical applications of these methods to an existing implementation. The ability to detect performance…
▽ More
While application profiling has been a mainstay in the HPC community for years, profiling of MPI and other communication middleware has not received the same degree of exploration. This paper adds to the discussion of MPI profiling, contributing two general-purpose profiling methods as well as practical applications of these methods to an existing implementation. The ability to detect performance defects in MPI codes using these methods increases the potential of further research and development in communication optimization.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Implementation-Oblivious Transparent Checkpoint-Restart for MPI
Authors:
Yao Xu,
Leonid Belyaev,
Twinkle Jain,
Derek Schafer,
Anthony Skjellum,
Gene Cooperman
Abstract:
This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each…
▽ More
This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
MPI Advance : Open-Source Message Passing Optimizations
Authors:
Amanda Bienz,
Derek Schafer,
Anthony Skjellum
Abstract:
The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI inst…
▽ More
The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI installations and therefore rely on whichever implementation is provided as a system install. This paper presents MPI Advance, a collection of libraries that sit on top of MPI, optimizing the underlying performance of any existing MPI library. The libraries provide optimizations for collectives, neighborhood collectives, partitioned communication, and GPU-aware communication.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
A More Scalable Sparse Dynamic Data Exchange
Authors:
Andrew Geyko,
Gerald Collom,
Derek Schafer,
Patrick Bridges,
Amanda Bienz
Abstract:
Parallel architectures are continually increasing in performance and scale, while underlying algorithmic infrastructure often fail to take full advantage of available compute power. Within the context of MPI, irregular communication patterns create bottlenecks in parallel applications. One common bottleneck is the sparse dynamic data exchange, often required when forming communication patterns wit…
▽ More
Parallel architectures are continually increasing in performance and scale, while underlying algorithmic infrastructure often fail to take full advantage of available compute power. Within the context of MPI, irregular communication patterns create bottlenecks in parallel applications. One common bottleneck is the sparse dynamic data exchange, often required when forming communication patterns within applications. There are a large variety of approaches for these dynamic exchanges, with optimizations implemented directly in parallel applications. This paper proposes a novel API within an MPI extension library, allowing for applications to utilize the variety of provided optimizations for sparse dynamic data exchange methods. Further, the paper presents novel locality-aware sparse dynamic data exchange algorithms. Finally, performance results show significant speedups up to 20x with the novel locality-aware algorithms.
△ Less
Submitted 3 April, 2024; v1 submitted 26 August, 2023;
originally announced August 2023.
-
Collective-Optimized FFTs
Authors:
Evelyn Namugwanya,
Amanda Bienz,
Derek Schafer,
Anthony Skjellum
Abstract:
This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs.
This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs.
△ Less
Submitted 4 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Checkpoint-Restart Libraries Must Become More Fault Tolerant
Authors:
Anthony Skjellum,
Derek Schafer
Abstract:
Production MPI codes need checkpoint-restart (CPR) support. Clearly, checkpoint-restart libraries must be fault tolerant lest they open up a window of vulnerability for failures with byzantine outcomes. But, certain popular libraries that leverage MPI are evidently not fault tolerant. Nowadays, fault detection with automatic recovery without batch requeueing is a strong requirement for production…
▽ More
Production MPI codes need checkpoint-restart (CPR) support. Clearly, checkpoint-restart libraries must be fault tolerant lest they open up a window of vulnerability for failures with byzantine outcomes. But, certain popular libraries that leverage MPI are evidently not fault tolerant. Nowadays, fault detection with automatic recovery without batch requeueing is a strong requirement for production environments. Thus, allowing deadlock and setting long timeouts are suboptimal for fault detection even when paired with conservative recovery from the penultimate checkpoint.
When MPI is used as a communication mechanism within a CPR library, such libraries must offer fault-tolerant extensions with minimal detection, isolation, mitigation, and potential recovery semantics to aid the CPR's library fail-backward. Communication between MPI and the checkpoint library regarding system health may be valuable. For fault-tolerant MPI programs (e.g., using APIs like FA-MPI, Stages/Reinit, or ULFM), the checkpoint library must cooperate with the extended model or else invalidate fault-tolerant operation.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
MPIs Language Bindings are Holding MPI Back
Authors:
Martin Ruefenacht,
Derek Schafer,
Anthony Skjellum,
Purushotham V. Bangalore
Abstract:
Over the past two decades, C++ has been adopted as a major HPC language (displacing C to a large extent, andFortran to some degree as well). Idiomatic C++ is clearly how C++ is being used nowadays. But, MPIs syntax and semantics defined and extended with C and Fortran interfaces that align with the capabilities and limitations of C89 and Fortran-77.Unfortunately, the language-independent specifica…
▽ More
Over the past two decades, C++ has been adopted as a major HPC language (displacing C to a large extent, andFortran to some degree as well). Idiomatic C++ is clearly how C++ is being used nowadays. But, MPIs syntax and semantics defined and extended with C and Fortran interfaces that align with the capabilities and limitations of C89 and Fortran-77.Unfortunately, the language-independent specification also clearly reflects the intersection of what these languages could syntactically and semantically manage at the outset in 1993, rather than being truly language neutral.In this paper, we propose a modern C++ language interface to replace the C language binding for C++ programmers with an upward-compatible architecture that leverages all the benefits of C++11-20 for performance, productivity, and interoperability with other popular C++ libraries and interfaces for HPC. Demand is demonstrably strong for this second attempt at language support for C++ in MPI after the original interface, which was added in MPI-2, then was found to lack specific benefits over theC binding, and so was subsequently removed in MPI-3. Since C++ and its idiomatic usage have evolved since the original C++ language binding was removed from the standard, this new effort is both timely and important for MPI applications. Also, many C++ application programmers create their own, ad hoc shim libraries over MPI to provide some degree of abstraction unique to their particular project, which means many such abstraction libraries are being devised without any specific commonality other than the demand for such.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
A Survey on Predictive Maintenance for Industry 4.0
Authors:
Christian Krupitzer,
Tim Wagenhals,
Marwin Züfle,
Veronika Lesch,
Dominik Schäfer,
Amin Mozaffarin,
Janick Edinger,
Christian Becker,
Samuel Kounev
Abstract:
Production issues at Volkswagen in 2016 lead to dramatic losses in sales of up to 400 million Euros per week. This example shows the huge financial impact of a working production facility for companies. Especially in the data-driven domains of Industry 4.0 and Industrial IoT with intelligent, connected machines, a conventional, static maintenance schedule seems to be old-fashioned. In this paper,…
▽ More
Production issues at Volkswagen in 2016 lead to dramatic losses in sales of up to 400 million Euros per week. This example shows the huge financial impact of a working production facility for companies. Especially in the data-driven domains of Industry 4.0 and Industrial IoT with intelligent, connected machines, a conventional, static maintenance schedule seems to be old-fashioned. In this paper, we present a survey on the current state of the art in predictive maintenance for Industry 4.0. Based on a structured literate survey, we present a classification of predictive maintenance in the context of Industry 4.0 and discuss recent developments in this area.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
A Survey on Human Machine Interaction in Industry 4.0
Authors:
Christian Krupitzer,
Sebastian Müller,
Veronika Lesch,
Marwin Züfle,
Janick Edinger,
Alexander Lemken,
Dominik Schäfer,
Samuel Kounev,
Christian Becker
Abstract:
Industry 4.0 or Industrial IoT both describe new paradigms for seamless interaction between humans and machines. Both concepts rely on intelligent, inter-connected cyber-physical production systems that are able to control the process flow of industrial production. As those machines take many decisions autonomously and further interact with production and manufacturing planning systems, the integr…
▽ More
Industry 4.0 or Industrial IoT both describe new paradigms for seamless interaction between humans and machines. Both concepts rely on intelligent, inter-connected cyber-physical production systems that are able to control the process flow of industrial production. As those machines take many decisions autonomously and further interact with production and manufacturing planning systems, the integration of human users requires new paradigms. In this paper, we provide an analysis of the current state-of-the-art in human-machine interaction in the Industry 4.0 domain.We focus on new paradigms that integrate the application of augmented and virtual reality technology. Based on our analysis, we further provide a discussion of research challenges.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Extending the Message Passing Interface (MPI) with User-Level Schedules
Authors:
Derek Schafer,
Sheikh Ghafoor,
Daniel Holmes,
Martin Ruefenacht,
Anthony Skjellum
Abstract:
Composability is one of seven reasons for the long-standing and continuing success of MPI. Extending MPI by composing its operations with user-level operations provides useful integration with the progress engine and completion notification methods of MPI. However, the existing extensibility mechanism in MPI (generalized requests) is not widely utilized and has significant drawbacks.
MPI can be…
▽ More
Composability is one of seven reasons for the long-standing and continuing success of MPI. Extending MPI by composing its operations with user-level operations provides useful integration with the progress engine and completion notification methods of MPI. However, the existing extensibility mechanism in MPI (generalized requests) is not widely utilized and has significant drawbacks.
MPI can be generalized via scheduled communication primitives, for example, by utilizing implementation techniques from existing MPI-3 nonblocking collectives and from forthcoming MPI-4 persistent and partitioned APIs. Non-trivial schedules are used internally in some MPI libraries; but, they are not accessible to end-users.
Message-based communication patterns can be built as libraries on top of MPI. Such libraries can have comparable implementation maturity and potentially higher performance than MPI library code, but do not require intimate knowledge of the MPI implementation. Libraries can provide performance-portable interfaces that cross MPI implementation boundaries. The ability to compose additional user-defined operations using the same progress engine benefits all kinds of general purpose HPC libraries.
We propose a definition for MPI schedules: a user-level programming model suitable for creating persistent collective communication composed with new application-specific sequences of user-defined operations managed by MPI and fully integrated with MPI progress and completion notification. The API proposed offers a path to standardization for extensible communication schedules involving user-defined operations. Our approach has the potential to introduce event-driven programming into MPI (beyond the tools interface), although connecting schedules with events comprises future work.
Early performance results described here are promising and indicate strong overlap potential.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Combination interventions for Hepatitis C and Cirrhosis reduction among people who inject drugs: An agent-based, networked population simulation experiment
Authors:
Bilal Khan,
Ian Duncan,
Mohamad Saad,
Daniel Schaefer,
Ashly Jordan,
Daniel Smith,
Alan Neaigus,
Don Des Jarlais,
Holly Hagan,
Kirk Dombrowski
Abstract:
Hepatitis C virus (HCV) infection is endemic in people who inject drugs (PWID), with prevalence estimates above 60 percent for PWID in the United States. Previous modeling studies suggest that direct acting antiviral (DAA) treatment can lower overall prevalence in this population, but treatment is often delayed until the onset of advanced liver disease (fibrosis stage 3 or later) due to cost. Lowe…
▽ More
Hepatitis C virus (HCV) infection is endemic in people who inject drugs (PWID), with prevalence estimates above 60 percent for PWID in the United States. Previous modeling studies suggest that direct acting antiviral (DAA) treatment can lower overall prevalence in this population, but treatment is often delayed until the onset of advanced liver disease (fibrosis stage 3 or later) due to cost. Lower cost interventions featuring syringe access (SA) and medically assisted treatment (MAT) for addiction are known to be less costly, but have shown mixed results in lowering HCV rates below current levels. Little is known about the potential synergistic effects of combining DAA and MAT treatment, and large-scale tests of combined interventions are rare. While simulation experiments can reveal likely long-term effects, most prior simulations have been performed on closed populations of model agents--a scenario quite different from the open, mobile populations known to most health agencies. This paper uses data from the Centers for Disease Control's National HIV Behavioral Surveillance project, IDU round 3, collected in New York City in 2012 by the New York City Department of Health and Mental Hygiene to parameterize simulations of open populations. Our results show that, in an open population, SA/MAT by itself has only small effects on HCV prevalence, while DAA treatment by itself can significantly lower both HCV and HCV-related advanced liver disease prevalence. More importantly, the simulation experiments suggest that cost effective synergistic combinations of the two strategies can dramatically reduce HCV incidence. We conclude that adopting SA/MAT implementations alongside DAA interventions can play a critical role in reducing the long-term consequences of ongoing infection.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.