-
THE CAVES Project - Collaborative Analysis Versioning Environment System; THE CODESH Project - Collaborative Development Shell
Authors:
Dimitri Bourilkov
Abstract:
A key feature of collaboration in science and software development is to have a {\em log} of what and how is being done - for private use and reuse and for sharing selected parts with collaborators, which most often today are distributed geographically on an ever larger scale. Even better if this log is {\em automatic}, created on the fly while a scientist or software developer is working in a h…
▽ More
A key feature of collaboration in science and software development is to have a {\em log} of what and how is being done - for private use and reuse and for sharing selected parts with collaborators, which most often today are distributed geographically on an ever larger scale. Even better if this log is {\em automatic}, created on the fly while a scientist or software developer is working in a habitual way, without the need for extra efforts. The {\tt CAVES} and {\tt CODESH} projects address this problem in a novel way, building on the concepts of {\em virtual state} and {\em virtual transition} to provide an automatic persistent logbook for sessions of data analysis or software development in a collaborating group. A repository of sessions can be configured dynamically to record and make available the knowledge accumulated in the course of a scientific or software endeavor. Access can be controlled to define logbooks of private sessions and sessions shared within or between collaborating groups.
△ Less
Submitted 22 October, 2004;
originally announced October 2004.
-
Virtual Data in CMS Production
Authors:
A. Arbree,
P. Avery,
D. Bourilkov,
R. Cavanaugh,
G. Graham,
S. Katageri,
J. Rodriguez,
J. Voeckler,
M. Wilde
Abstract:
Initial applications of the GriPhyN Chimera Virtual Data System have been performed within the context of CMS Production of Monte Carlo Simulated Data. The GriPhyN Chimera system consists of four primary components: 1) a Virtual Data Language, which is used to describe virtual data products, 2) a Virtual Data Catalog, which is used to store virtual data entries, 3) an Abstract Planner, which res…
▽ More
Initial applications of the GriPhyN Chimera Virtual Data System have been performed within the context of CMS Production of Monte Carlo Simulated Data. The GriPhyN Chimera system consists of four primary components: 1) a Virtual Data Language, which is used to describe virtual data products, 2) a Virtual Data Catalog, which is used to store virtual data entries, 3) an Abstract Planner, which resolves all dependencies of a particular virtual data product and forms a location and existence independent plan, 4) a Concrete Planner, which maps an abstract, logical plan onto concrete, physical grid resources accounting for staging in/out files and publishing results to a replica location service. A CMS Workflow Planner, MCRunJob, is used to generate virtual data products using the Virtual Data Language. Subsequently, a prototype workflow manager, known as WorkRunner, is used to schedule the instantiation of virtual data products across a grid.
△ Less
Submitted 31 May, 2003;
originally announced June 2003.
-
The CMS Integration Grid Testbed
Authors:
Gregory E. Graham,
M. Anzar Afaq,
Shafqat Aziz,
L. A. T. Bauerdick,
Michael Ernst,
Joseph Kaiser,
Natalia Ratnikova,
Hans Wenzel,
Yujun Wu,
Erik Aslakson,
Julian Bunn,
Saima Iqbal,
Iosif Legrand,
Harvey Newman,
Suresh Singh,
Conrad Steenberg,
James Branson,
Ian Fisk,
James Letts,
Adam Arbree,
Paul Avery,
Dimitri Bourilkov,
Richard Cavanaugh,
Jorge Rodriguez,
Suchindra Kategari
, et al. (5 additional authors not shown)
Abstract:
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed us…
▽ More
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.
△ Less
Submitted 10 June, 2003; v1 submitted 30 May, 2003;
originally announced May 2003.