-
Rucio - Scientific data management
Authors:
Martin Barisits,
Thomas Beermann,
Frank Berghaus,
Brian Bockelman,
Joaquin Bogado,
David Cameron,
Dimitrios Christidis,
Diego Ciangottini,
Gancho Dimitrov,
Markus Elsing,
Vincent Garonne,
Alessandro di Girolamo,
Luc Goossens,
Wen Guan,
Jaroslav Guenther,
Tomas Javurek,
Dietmar Kuhn,
Mario Lassnig,
Fernando Lopez,
Nicolo Magini,
Angelos Molfetas,
Armin Nairz,
Farid Ould-Saada,
Stefan Prenner,
Cedric Serfon
, et al. (5 additional authors not shown)
Abstract:
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support t…
▽ More
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support the LHC experiments and other diverse scientific communities. In this article, we detail the fundamental concepts of Rucio, describe the architecture along with implementation details, and give operational experience from production usage.
△ Less
Submitted 6 June, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Federating distributed storage for clouds in ATLAS
Authors:
Frank Berghaus,
Kevin Casteels,
Alessandro Di Girolamo,
Colson Driemel,
Marcus Ebert,
Fabrizio Furano,
Fernado Galindo,
Mario Lassnig,
Colin Leavett-Brown,
Michael Paterson,
Cedric Serfon,
Rolf Seuster,
Randall Sobie,
Reda Tafirout,
Ryan Paul Taylor
Abstract:
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently…
▽ More
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently we are using the dynamic data federation Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standards for connection protocols like Amazon's S3, Microsoft's Azure, as well as WebDAV and HTTP. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have setup an instance of Dynafed and integrated it into the ATLAS data distribution management system. We report on the challenges faced during the installation and integration. We have tested ATLAS analysis jobs submitted by the PanDA production system and we report on our first experiences with its operation.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
The case for preserving our knowledge and data in physics experiments
Authors:
Frank Berghaus
Abstract:
This proceeding covers tools and technologies at our disposal for scientific data preservation and shows that this extends the scientific reach of our experiments. It is cost-efficient to warehouse data from completed experiments on the tape archives of our national and international laboratories. These subject-specific data stores also offer the technologies to capture and archive knowledge about…
▽ More
This proceeding covers tools and technologies at our disposal for scientific data preservation and shows that this extends the scientific reach of our experiments. It is cost-efficient to warehouse data from completed experiments on the tape archives of our national and international laboratories. These subject-specific data stores also offer the technologies to capture and archive knowledge about experiments in the form of technical notes, electronic logs, websites, etc. Furthermore, it is possible to archive our source code and computing environments. The paper illustrates these challenges with experience from preserving the LEP data for the long term.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Status Report of the DPHEP Collaboration: A Global Effort for Sustainable Data Preservation in High Energy Physics
Authors:
DPHEP Collaboration,
Silvia Amerio,
Roberto Barbera,
Frank Berghaus,
Jakob Blomer,
Andrew Branson,
Germán Cancio,
Concetta Cartaro,
Gang Chen,
Sünje Dallmeier-Tiessen,
Cristinel Diaconu,
Gerardo Ganis,
Mihaela Gheata,
Takanori Hara,
Ken Herner,
Mike Hildreth,
Roger Jones,
Stefan Kluth,
Dirk Krücker,
Kati Lassila-Perini,
Marcello Maggi,
Jesus Marco de Lucas,
Salvatore Mele,
Alberto Pace,
Matthias Schröder
, et al. (9 additional authors not shown)
Abstract:
Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizati…
▽ More
Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP and an extended blueprint paper was published in 2012. In July 2014 the DPHEP collaboration was formed as a result of the signature of the Collaboration Agreement by seven large funding agencies (others have since joined or are in the process of acquisition) and in June 2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting took place.
This status report of the DPHEP collaboration details the progress during the period from 2013 to 2015 inclusive.
△ Less
Submitted 17 February, 2016; v1 submitted 7 December, 2015;
originally announced December 2015.
-
Dynamic web cache publishing for IaaS clouds using Shoal
Authors:
Ian Gable,
Michael Chester,
Patrick Armstrong,
Frank Berghaus,
Andre Charbonneau,
Colin Leavett-Brown,
Michael Paterson,
Robert Prior,
Randall Sobie,
Ryan Taylor
Abstract:
We have developed a highly scalable application, called Shoal, for tracking and utilizing a distributed set of HTTP web caches. Squid servers advertise their existence to the Shoal server via AMQP messaging by running Shoal Agent. The Shoal server provides a simple REST interface that allows clients to determine their closest Squid cache. Our goal is to dynamically instantiate Squid caches on IaaS…
▽ More
We have developed a highly scalable application, called Shoal, for tracking and utilizing a distributed set of HTTP web caches. Squid servers advertise their existence to the Shoal server via AMQP messaging by running Shoal Agent. The Shoal server provides a simple REST interface that allows clients to determine their closest Squid cache. Our goal is to dynamically instantiate Squid caches on IaaS clouds in response to client demand. Shoal provides the VMs on IaaS clouds with the location of the nearest dynamically instantiated Squid Cache. In this paper, we describe the design and performance of Shoal.
△ Less
Submitted 31 October, 2013;
originally announced November 2013.
-
Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics
Authors:
The ATLAS Collaboration,
G. Aad,
E. Abat,
B. Abbott,
J. Abdallah,
A. A. Abdelalim,
A. Abdesselam,
O. Abdinov,
B. Abi,
M. Abolins,
H. Abramowicz,
B. S. Acharya,
D. L. Adams,
T. N. Addy,
C. Adorisio,
P. Adragna,
T. Adye,
J. A. Aguilar-Saavedra,
M. Aharrouche,
S. P. Ahlen,
F. Ahles,
A. Ahmad,
H. Ahmed,
G. Aielli,
T. Akdogan
, et al. (2587 additional authors not shown)
Abstract:
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on…
▽ More
A detailed study is presented of the expected performance of the ATLAS detector. The reconstruction of tracks, leptons, photons, missing energy and jets is investigated, together with the performance of b-tagging and the trigger. The physics potential for a variety of interesting physics processes, within the Standard Model and beyond, is examined. The study comprises a series of notes based on simulations of the detector and physics processes, with particular emphasis given to the data expected from the first years of operation of the LHC at CERN.
△ Less
Submitted 14 August, 2009; v1 submitted 28 December, 2008;
originally announced January 2009.