Simulating Data Access Profiles of Computational Jobs in Data Grids
Authors:
Volodimir Begy,
Joeri Hermans,
Martin Barisits,
Mario Lassnig,
Erich Schikuta
Abstract:
The data access patterns of applications running in computing grids are changing due to the recent proliferation of high speed local and wide area networks. The data-intensive jobs are no longer strictly required to run at the computing sites, where the respective input data are located. Instead, jobs may access the data employing arbitrary combinations of data-placement, stage-in and remote data…
▽ More
The data access patterns of applications running in computing grids are changing due to the recent proliferation of high speed local and wide area networks. The data-intensive jobs are no longer strictly required to run at the computing sites, where the respective input data are located. Instead, jobs may access the data employing arbitrary combinations of data-placement, stage-in and remote data access. These data access profiles exhibit partially non-overlapping throughput bottlenecks. This fact can be exploited in order to minimize the time jobs spend waiting for input data. In this work we present a novel grid computing simulator, which puts a heavy emphasis on the various data access profiles. The fundamental assumptions underlying our simulator are justified by empirical experiments performed in the Worldwide LHC Computing Grid (WLCG) at CERN. We demonstrate how to calibrate the simulator parameters in accordance with the true system using posterior inference with likelihood-free Markov Chain Monte Carlo. Thereafter, we validate the simulator's output with respect to an authentic production workload from WLCG, demonstrating its remarkable accuracy.
△ Less
Submitted 12 March, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
Rucio - Scientific data management
Authors:
Martin Barisits,
Thomas Beermann,
Frank Berghaus,
Brian Bockelman,
Joaquin Bogado,
David Cameron,
Dimitrios Christidis,
Diego Ciangottini,
Gancho Dimitrov,
Markus Elsing,
Vincent Garonne,
Alessandro di Girolamo,
Luc Goossens,
Wen Guan,
Jaroslav Guenther,
Tomas Javurek,
Dietmar Kuhn,
Mario Lassnig,
Fernando Lopez,
Nicolo Magini,
Angelos Molfetas,
Armin Nairz,
Farid Ould-Saada,
Stefan Prenner,
Cedric Serfon
, et al. (5 additional authors not shown)
Abstract:
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support t…
▽ More
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support the LHC experiments and other diverse scientific communities. In this article, we detail the fundamental concepts of Rucio, describe the architecture along with implementation details, and give operational experience from production usage.
△ Less
Submitted 6 June, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.