-
Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 AI diagnosis
Authors:
Iacopo Colonnelli,
Barbara Cantalupo,
Concetto Spampinato,
Matteo Pennisi,
Marco Aldinucci
Abstract:
HPC is an enabling platform for AI. The introduction of AI workloads in the HPC applications basket has non-trivial consequences both on the way of designing AI applications and on the way of providing HPC computing. This is the leitmotif of the convergence between HPC and AI. The formalized definition of AI pipelines is one of the milestones of HPC-AI convergence. If well conducted, it allows, on…
▽ More
HPC is an enabling platform for AI. The introduction of AI workloads in the HPC applications basket has non-trivial consequences both on the way of designing AI applications and on the way of providing HPC computing. This is the leitmotif of the convergence between HPC and AI. The formalized definition of AI pipelines is one of the milestones of HPC-AI convergence. If well conducted, it allows, on the one hand, to obtain portable and scalable applications. On the other hand, it is crucial for the reproducibility of scientific pipelines. In this work, we advocate the StreamFlow Workflow Management System as a crucial ingredient to define a parametric pipeline, called "CLAIRE COVID-19 Universal Pipeline," which is able to explore the optimization space of methods to classify COVID-19 lung lesions from CT scans, compare them for accuracy, and therefore set a performance baseline. The universal pipeline automatizes the training of many different Deep Neural Networks (DNNs) and many different hyperparameters. It, therefore, requires a massive computing power, which is found in traditional HPC infrastructure thanks to the portability-by-design of pipelines designed with StreamFlow. Using the universal pipeline, we identified a DNN reaching over 90% accuracy in detecting COVID-19 lesions in CT scans.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
StreamFlow: cross-breeding cloud with HPC
Authors:
Iacopo Colonnelli,
Barbara Cantalupo,
Ivan Merelli,
Marco Aldinucci
Abstract:
Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of pot…
▽ More
Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single-cell transcriptomic data analysis workflow.
△ Less
Submitted 30 August, 2020; v1 submitted 4 February, 2020;
originally announced February 2020.
-
The EU DataGrid Workload Management System: towards the second major release
Authors:
G. Avellino,
S. Barale,
S. Beco,
B. Cantalupo,
D. Colling,
F. Giacomini,
A. Gianelle,
A. Guarise,
A. Krenek,
D. Kouril,
A. Maraschini,
L. Matyska,
M. Mezzadri,
S. Monforte,
M. Mulac,
F. Pacini,
M. Pappalardo,
R. Peluso,
J. Pospisil,
F. Prelz,
E. Ronchieri,
M. Ruda,
L. Salconi,
Z. Salvet,
M. Sgaravatto
, et al. (4 additional authors not shown)
Abstract:
In the first phase of the European DataGrid project, the 'workload management' package (WP1) implemented a working prototype, providing users with an environment allowing to define and submit jobs to the Grid, and able to find and use the ``best'' resources for these jobs. Application users have now been experiencing for about a year now with this first release of the workload management system.…
▽ More
In the first phase of the European DataGrid project, the 'workload management' package (WP1) implemented a working prototype, providing users with an environment allowing to define and submit jobs to the Grid, and able to find and use the ``best'' resources for these jobs. Application users have now been experiencing for about a year now with this first release of the workload management system. The experiences acquired, the feedback received by the user and the need to plug new components implementing new functionalities, triggered an update of the existing architecture. A description of this revised and complemented workload management system is given.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
The first deployment of workload management services on the EU DataGrid Testbed: feedback on design and implementation
Authors:
G. Avellino,
S. Beco,
B. Cantalupo,
F. Pacini,
A. Terracina,
A. Maraschini,
D. Colling,
S. Monforte,
M. Pappalardo,
L. Salconi,
F. Giacomini,
E. Ronchieri,
D. Kouril,
A. Krenek,
L. Matyska,
M. Mulac,
J. Pospisil,
M. Ruda,
Z. Salvet,
J. Sitera,
M. Vocu,
M. Mezzadri,
F. Prelz,
A. Gianelle,
R. Peluso
, et al. (4 additional authors not shown)
Abstract:
Application users have now been experiencing for about a year with the standardized resource brokering services provided by the 'workload management' package of the EU DataGrid project (WP1). Understanding, shaping and pushing the limits of the system has provided valuable feedback on both its design and implementation. A digest of the lessons, and "better practices", that were learned, and that…
▽ More
Application users have now been experiencing for about a year with the standardized resource brokering services provided by the 'workload management' package of the EU DataGrid project (WP1). Understanding, shaping and pushing the limits of the system has provided valuable feedback on both its design and implementation. A digest of the lessons, and "better practices", that were learned, and that were applied towards the second major release of the software, is given.
△ Less
Submitted 31 May, 2003;
originally announced June 2003.