-
INDIGO-DataCloud:A data and computing platform to facilitate seamless access to e-infrastructures
Authors:
INDIGO-DataCloud Collaboration,
:,
Davide Salomoni,
Isabel Campos,
Luciano Gaido,
Jesus Marco de Lucas,
Peter Solagna,
Jorge Gomes,
Ludek Matyska,
Patrick Fuhrman,
Marcus Hardt,
Giacinto Donvito,
Lukasz Dutka,
Marcin Plociennik,
Roberto Barbera,
Ignacio Blanquer,
Andrea Ceccanti,
Mario David,
Cristina Duma,
Alvaro López-García,
Germán Moltó,
Pablo Orviz,
Zdenek Sustr,
Matthew Viljoen,
Fernando Aguilar
, et al. (40 additional authors not shown)
Abstract:
This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applicat…
▽ More
This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications.
△ Less
Submitted 5 February, 2019; v1 submitted 6 November, 2017;
originally announced November 2017.
-
INDIGO-Datacloud: foundations and architectural description of a Platform as a Service oriented to scientific computing
Authors:
D. Salomoni,
I. Campos,
L. Gaido,
G. Donvito,
M. Antonacci,
P. Fuhrman,
J. Marco,
A. Lopez-Garcia,
P. Orviz,
I. Blanquer,
M. Caballer,
G. Molto,
M. Plociennik,
M. Owsiak,
M. Urbaniak,
M. Hardt,
A. Ceccanti,
B. Wegh,
J. Gomes,
M. David,
C. Aiftimiei,
L. Dutka,
B. Kryza,
T. Szepieniec,
S. Fiore
, et al. (10 additional authors not shown)
Abstract:
In this paper we describe the architecture of a Platform as a Service (PaaS) oriented to computing and data analysis. In order to clarify the choices we made, we explain the features using practical examples, applied to several known usage patterns in the area of HEP computing. The proposed architecture is devised to provide researchers with a unified view of distributed computing infrastructures,…
▽ More
In this paper we describe the architecture of a Platform as a Service (PaaS) oriented to computing and data analysis. In order to clarify the choices we made, we explain the features using practical examples, applied to several known usage patterns in the area of HEP computing. The proposed architecture is devised to provide researchers with a unified view of distributed computing infrastructures, focusing in facilitating seamless access. In this respect the Platform is able to profit from the most recent developments for computing and processing large amounts of data, and to exploit current storage and preservation technologies, with the appropriate mechanisms to ensure security and privacy.
△ Less
Submitted 22 April, 2016; v1 submitted 31 March, 2016;
originally announced March 2016.
-
Distributed Offline Data Reconstruction in BaBar
Authors:
Teela Pulliam,
Peter Elmer,
Alvise Dorigo
Abstract:
The BaBar experiment at SLAC is in its fourth year of running. The data processing system has been continuously evolving to meet the challenges of higher luminosity running and the increasing bulk of data to re-process each year. To meet these goals a two-pass processing architecture has been adopted, where 'rolling calibrations' are quickly calculated on a small fraction of the events in the fi…
▽ More
The BaBar experiment at SLAC is in its fourth year of running. The data processing system has been continuously evolving to meet the challenges of higher luminosity running and the increasing bulk of data to re-process each year. To meet these goals a two-pass processing architecture has been adopted, where 'rolling calibrations' are quickly calculated on a small fraction of the events in the first pass and the bulk data reconstruction done in the second. This allows for quick detector feedback in the first pass and allows for the parallelization of the second pass over two or more separate farms. This two-pass system allows also for distribution of processing farms off-site. The first such site has been setup at INFN Padova. The challenges met here were many. The software was ported to a full Linux-based, commodity hardware system. The raw dataset, 90 TB, was imported from SLAC utilizing a 155 Mbps network link. A system for quality control and export of the processed data back to SLAC was developed. Between SLAC and Padova we are currently running three pass-one farms, with 32 CPUs each, and nine pass-two farms with 64 to 80 CPUs each. The pass-two farms can process between 2 and 4 million events per day. Details about the implementation and performance of the system will be presented.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.