-
A Terminology for Scientific Workflow Systems
Authors:
Frédéric Suter,
Tainã Coleman,
İlkay Altintaş,
Rosa M. Badia,
Bartosz Balis,
Kyle Chard,
Iacopo Colonnelli,
Ewa Deelman,
Paolo Di Tommaso,
Thomas Fahringer,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Johannes Köster,
Ulf Leser,
Kshitij Mehta,
Hilary Oliver,
J. -Luc Peterson,
Giovanni Pizzi,
Loïc Pottier,
Raül Sirvent,
Eric Suchyta,
Douglas Thain,
Sean R. Wilkinson,
Justin M. Wozniak
, et al. (1 additional authors not shown)
Abstract:
The term scientific workflow has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow management systems (WMSs)…
▽ More
The term scientific workflow has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow management systems (WMSs) have been developed to manage and run these workflows. However, no turnkey solution has emerged to address the diversity of scientific processes and the infrastructure on which they are implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new WMS. A direct consequence is that many existing WMSs share some salient features, offer similar functionalities, and can manage the same categories of workflows but also have some distinct capabilities. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
△ Less
Submitted 10 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Analysis Cyberinfrastructure: Challenges and Opportunities
Authors:
Kevin Lannon,
Paul Brenner,
Mike Hildreth,
Kenyi Hurtado Anampa,
Alan Malta Rodrigues,
Kelci Mohrman,
Doug Thain,
Benjamin Tovar
Abstract:
Analysis cyberinfrastructure refers to the combination of software and computer hardware used to support late-stage data analysis in High Energy Physics (HEP). For the purposes of this white paper, late-stage data analysis refers specifically to the step of transforming the most reduced common data format produced by a given experimental collaboration (for example, nanoAOD for the CMS experiment)…
▽ More
Analysis cyberinfrastructure refers to the combination of software and computer hardware used to support late-stage data analysis in High Energy Physics (HEP). For the purposes of this white paper, late-stage data analysis refers specifically to the step of transforming the most reduced common data format produced by a given experimental collaboration (for example, nanoAOD for the CMS experiment) into histograms. In this white paper, we reflect on observations gathered from a recent experience with data analysis using a recent, python-based analysis framework, and extrapolate these experiences though the High-Luminosity LHC era as way of highlighting potential R\&D topics in analysis cyberinfrastructure.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
A Community Roadmap for Scientific Workflows Research and Development
Authors:
Rafael Ferreira da Silva,
Henri Casanova,
Kyle Chard,
Ilkay Altintas,
Rosa M Badia,
Bartosz Balis,
Tainã Coleman,
Frederik Coppens,
Frank Di Natale,
Bjoern Enders,
Thomas Fahringer,
Rosa Filgueira,
Grigori Fursin,
Daniel Garijo,
Carole Goble,
Dorran Howell,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Ulf Leser,
Maciej Malawski,
Kshitij Mehta,
Loïc Pottier,
Jonathan Ozik,
J. Luc Peterson
, et al. (4 additional authors not shown)
Abstract:
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows…
▽ More
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual "Workflows Community Summits" (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.
△ Less
Submitted 8 October, 2021; v1 submitted 5 October, 2021;
originally announced October 2021.
-
Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development
Authors:
Rafael Ferreira da Silva,
Henri Casanova,
Kyle Chard,
Tainã Coleman,
Dan Laney,
Dong Ahn,
Shantenu Jha,
Dorran Howell,
Stian Soiland-Reys,
Ilkay Altintas,
Douglas Thain,
Rosa Filgueira,
Yadu Babuji,
Rosa M. Badia,
Bartosz Balis,
Silvina Caino-Lores,
Scott Callaghan,
Frederik Coppens,
Michael R. Crusoe,
Kaushik De,
Frank Di Natale,
Tu M. A. Do,
Bjoern Enders,
Thomas Fahringer,
Anne Fouilloux
, et al. (33 additional authors not shown)
Abstract:
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i…
▽ More
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Workflows Community Summit: Bringing the Scientific Workflows Community Together
Authors:
Rafael Ferreira da Silva,
Henri Casanova,
Kyle Chard,
Dan Laney,
Dong Ahn,
Shantenu Jha,
Carole Goble,
Lavanya Ramakrishnan,
Luc Peterson,
Bjoern Enders,
Douglas Thain,
Ilkay Altintas,
Yadu Babuji,
Rosa M. Badia,
Vivien Bonazzi,
Taina Coleman,
Michael Crusoe,
Ewa Deelman,
Frank Di Natale,
Paolo Di Tommaso,
Thomas Fahringer,
Rosa Filgueira,
Grigori Fursin,
Alex Ganose,
Bjorn Gruning
, et al. (20 additional authors not shown)
Abstract:
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla…
▽ More
Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization
Authors:
Jeff Kinnison,
Nathaniel Kremer-Herman,
Douglas Thain,
Walter Scheirer
Abstract:
Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parall…
▽ More
Computer vision is experiencing an AI renaissance, in which machine learning models are expediting important breakthroughs in academic research and commercial applications. Effectively training these models, however, is not trivial due in part to hyperparameters: user-configured values that control a model's ability to learn from data. Existing hyperparameter optimization methods are highly parallel but make no effort to balance the search across heterogeneous hardware or to prioritize searching high-impact spaces. In this paper, we introduce a framework for massively Scalable Hardware-Aware Distributed Hyperparameter Optimization (SHADHO). Our framework calculates the relative complexity of each search space and monitors performance on the learning task over all trials. These metrics are then used as heuristics to assign hyperparameters to distributed workers based on their hardware. We first demonstrate that our framework achieves double the throughput of a standard distributed hyperparameter optimization framework by optimizing SVM for MNIST using 150 distributed workers. We then conduct model search with SHADHO over the course of one week using 74 GPUs across two compute clusters to optimize U-Net for a cell segmentation task, discovering 515 models that achieve a lower validation loss than standard U-Net.
△ Less
Submitted 22 January, 2018; v1 submitted 5 July, 2017;
originally announced July 2017.
-
DISTEA: Efficient Dynamic Impact Analysis for Distributed Systems
Authors:
Haipeng Cai,
Douglas Thain
Abstract:
Dynamic impact analysis is a fundamental technique for understanding the impact of specific program entities, or changes to them, on the rest of the program for concrete executions. However, existing techniques are either inapplicable or of very limited utility for distributed programs running in multiple concurrent processes. This paper presents DISTEA, a technique and tool for dynamic impact ana…
▽ More
Dynamic impact analysis is a fundamental technique for understanding the impact of specific program entities, or changes to them, on the rest of the program for concrete executions. However, existing techniques are either inapplicable or of very limited utility for distributed programs running in multiple concurrent processes. This paper presents DISTEA, a technique and tool for dynamic impact analysis of distributed systems. By partially ordering distributed method-execution events and inferring causality from the ordered events, DISTEA can predict impacts propagated both within and across process boundaries. We implemented DISTEA for Java and applied it to four distributed programs of various types and sizes, including two enterprise systems. We also evaluated the precision and practical usefulness of DISTEA, and demonstrated its application in program comprehension, through two case studies. The results show that DISTEA is highly scalable, more effective than existing alternatives, and instrumental to understanding distributed systems and their executions.
△ Less
Submitted 15 April, 2016;
originally announced April 2016.
-
Lessons Learned from an Experiment in Crowdsourcing Complex Citizen Engineering Tasks with Amazon Mechanical Turk
Authors:
Matthew Staffelbach,
Peter Sempolinski,
David Hachen,
Ahsan Kareem,
Tracy Kijewski-Correa,
Douglas Thain,
Daniel Wei,
Greg Madey
Abstract:
We investigate the feasibility of obtaining highly trustworthy results using crowdsourcing on complex engineering tasks. Crowdsourcing is increasingly seen as a potentially powerful way of increasing the supply of labor for solving society's problems. While applications in domains such as citizen-science, citizen-journalism or knowledge organization (e.g., Wikipedia) have seen many successful appl…
▽ More
We investigate the feasibility of obtaining highly trustworthy results using crowdsourcing on complex engineering tasks. Crowdsourcing is increasingly seen as a potentially powerful way of increasing the supply of labor for solving society's problems. While applications in domains such as citizen-science, citizen-journalism or knowledge organization (e.g., Wikipedia) have seen many successful applications, there have been fewer applications focused on solving engineering problems, especially those involving complex tasks. This may be in part because of concerns that low quality input into engineering analysis and design could result in failed structures leading to loss of life. We compared the quality of work of the anonymous workers of Amazon Mechanical Turk (AMT), an online crowdsourcing service, with the quality of work of expert engineers in solving the complex engineering task of evaluating virtual wind tunnel data graphs. On this representative complex engineering task, our results showed that there was little difference between expert engineers and crowdworkers in the quality of their work and explained reasons for these results. Along with showing that crowdworkers are effective at completing new complex tasks our paper supplies a number of important lessons that were learned in the process of collecting this data from AMT, which may be of value to other researchers.
△ Less
Submitted 29 June, 2014;
originally announced June 2014.