-
Federating distributed storage for clouds in ATLAS
Authors:
Frank Berghaus,
Kevin Casteels,
Alessandro Di Girolamo,
Colson Driemel,
Marcus Ebert,
Fabrizio Furano,
Fernado Galindo,
Mario Lassnig,
Colin Leavett-Brown,
Michael Paterson,
Cedric Serfon,
Rolf Seuster,
Randall Sobie,
Reda Tafirout,
Ryan Paul Taylor
Abstract:
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently…
▽ More
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently we are using the dynamic data federation Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standards for connection protocols like Amazon's S3, Microsoft's Azure, as well as WebDAV and HTTP. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have setup an instance of Dynafed and integrated it into the ATLAS data distribution management system. We report on the challenges faced during the installation and integration. We have tested ATLAS analysis jobs submitted by the PanDA production system and we report on our first experiences with its operation.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Status Report of the DPHEP Collaboration: A Global Effort for Sustainable Data Preservation in High Energy Physics
Authors:
DPHEP Collaboration,
Silvia Amerio,
Roberto Barbera,
Frank Berghaus,
Jakob Blomer,
Andrew Branson,
Germán Cancio,
Concetta Cartaro,
Gang Chen,
Sünje Dallmeier-Tiessen,
Cristinel Diaconu,
Gerardo Ganis,
Mihaela Gheata,
Takanori Hara,
Ken Herner,
Mike Hildreth,
Roger Jones,
Stefan Kluth,
Dirk Krücker,
Kati Lassila-Perini,
Marcello Maggi,
Jesus Marco de Lucas,
Salvatore Mele,
Alberto Pace,
Matthias Schröder
, et al. (9 additional authors not shown)
Abstract:
Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizati…
▽ More
Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP and an extended blueprint paper was published in 2012. In July 2014 the DPHEP collaboration was formed as a result of the signature of the Collaboration Agreement by seven large funding agencies (others have since joined or are in the process of acquisition) and in June 2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting took place.
This status report of the DPHEP collaboration details the progress during the period from 2013 to 2015 inclusive.
△ Less
Submitted 17 February, 2016; v1 submitted 7 December, 2015;
originally announced December 2015.
-
Dynamic web cache publishing for IaaS clouds using Shoal
Authors:
Ian Gable,
Michael Chester,
Patrick Armstrong,
Frank Berghaus,
Andre Charbonneau,
Colin Leavett-Brown,
Michael Paterson,
Robert Prior,
Randall Sobie,
Ryan Taylor
Abstract:
We have developed a highly scalable application, called Shoal, for tracking and utilizing a distributed set of HTTP web caches. Squid servers advertise their existence to the Shoal server via AMQP messaging by running Shoal Agent. The Shoal server provides a simple REST interface that allows clients to determine their closest Squid cache. Our goal is to dynamically instantiate Squid caches on IaaS…
▽ More
We have developed a highly scalable application, called Shoal, for tracking and utilizing a distributed set of HTTP web caches. Squid servers advertise their existence to the Shoal server via AMQP messaging by running Shoal Agent. The Shoal server provides a simple REST interface that allows clients to determine their closest Squid cache. Our goal is to dynamically instantiate Squid caches on IaaS clouds in response to client demand. Shoal provides the VMs on IaaS clouds with the location of the nearest dynamically instantiated Squid Cache. In this paper, we describe the design and performance of Shoal.
△ Less
Submitted 31 October, 2013;
originally announced November 2013.
-
HTC Scientific Computing in a Distributed Cloud Environment
Authors:
R. Sobie,
A. Agarwal,
I. Gable,
C. Leavett-Brown,
M. Paterson,
R. Taylor,
A. Charbonneau,
R. Impey,
W. Podiama
Abstract:
This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS) clouds that are utilized in a unified infrastructure. The distributed cloud has been in production-quality operation for two years with approximately 500,000 co…
▽ More
This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS) clouds that are utilized in a unified infrastructure. The distributed cloud has been in production-quality operation for two years with approximately 500,000 completed jobs where a typical workload has 500 simultaneous embarrassingly-parallel jobs that run for approximately 12 hours. We review the design and implementation of the system which is based on pre-existing components and a number of custom components. We discuss the operation of the system, and describe our plans for the expansion to more sites and increased computing capacity.
△ Less
Submitted 7 February, 2013;
originally announced February 2013.
-
Data Intensive High Energy Physics Analysis in a Distributed Cloud
Authors:
R. J. Sobie,
A. Agarwal,
M. Anderson,
P. Armstrong,
K. Fransham,
I. Gable,
D. Harris,
C. Leavett-Brown,
M. Paterson,
D. Penfold-Brown,
M. Vliet,
A. Charbonneau,
R. Impey,
W. Podaima
Abstract:
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We des…
▽ More
We show that distributed Infrastructure-as-a-Service (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We describe the process in which a user prepares an analysis virtual machine (VM) and submits batch jobs to a central scheduler. The system boots the user-specific VM on one of the IaaS clouds, runs the jobs and returns the output to the user. The user application accesses a central database for calibration data during the execution of the application. Similarly, the data is located in a central location and streamed by the running application. The system can easily run one hundred simultaneous jobs in an efficient manner and should scale to many hundreds and possibly thousands of user jobs.
△ Less
Submitted 1 January, 2011;
originally announced January 2011.
-
Cloud Scheduler: a resource manager for distributed compute clouds
Authors:
P. Armstrong,
A. Agarwal,
A. Bishop,
A. Charbonneau,
R. Desmarais,
K. Fransham,
N. Hill,
I. Gable,
S. Gaudet,
S. Goliath,
R. Impey,
C. Leavett-Brown,
J. Ouellete,
M. Paterson,
C. Pritchet,
D. Penfold-Brown,
W. Podaima,
D. Schade,
R. J. Sobie
Abstract:
The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual mach…
▽ More
The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual machine resource manager (Cloud Scheduler) for distributed compute clouds. Cloud Scheduler boots and manages the user-customized virtual machines in response to a user's job submission. We describe the motivation and design of the Cloud Scheduler and present results on its use on both science and commercial clouds.
△ Less
Submitted 30 June, 2010;
originally announced July 2010.