-
Adapting LIGO workflows to run in the Open Science Grid
Authors:
Edgar Fajardo,
Frank Wuerthwein,
Brian Bockelman,
Miron Livny,
Greg Thain,
James Alexander Clark,
Peter Couvares,
Josh Willis
Abstract:
During the first observation run the LIGO collaboration needed to offload some of its most, intense CPU workflows from its dedicated computing sites to opportunistic resources. Open Science Grid enabled LIGO to run PyCbC, RIFT and Bayeswave workflows to seamlessly run in a combination of owned and opportunistic resources. One of the challenges is enabling the workflows to use several heterogeneous…
▽ More
During the first observation run the LIGO collaboration needed to offload some of its most, intense CPU workflows from its dedicated computing sites to opportunistic resources. Open Science Grid enabled LIGO to run PyCbC, RIFT and Bayeswave workflows to seamlessly run in a combination of owned and opportunistic resources. One of the challenges is enabling the workflows to use several heterogeneous resources in a coordinated and effective way.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
A machine-compiled macroevolutionary history of Phanerozoic life
Authors:
Shanan E. Peters,
Ce Zhang,
Miron Livny,
Christopher RĂ©
Abstract:
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of palaeontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and e…
▽ More
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of palaeontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in complex data extraction and inference tasks and generates congruent synthetic macroevolutionary results. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We also show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.
△ Less
Submitted 19 July, 2014; v1 submitted 11 June, 2014;
originally announced June 2014.
-
The CMS Integration Grid Testbed
Authors:
Gregory E. Graham,
M. Anzar Afaq,
Shafqat Aziz,
L. A. T. Bauerdick,
Michael Ernst,
Joseph Kaiser,
Natalia Ratnikova,
Hans Wenzel,
Yujun Wu,
Erik Aslakson,
Julian Bunn,
Saima Iqbal,
Iosif Legrand,
Harvey Newman,
Suresh Singh,
Conrad Steenberg,
James Branson,
Ian Fisk,
James Letts,
Adam Arbree,
Paul Avery,
Dimitri Bourilkov,
Richard Cavanaugh,
Jorge Rodriguez,
Suchindra Kategari
, et al. (5 additional authors not shown)
Abstract:
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed us…
▽ More
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.
△ Less
Submitted 10 June, 2003; v1 submitted 30 May, 2003;
originally announced May 2003.