-
Integrating the PanDA Workload Management System with the Vera C. Rubin Observatory
Authors:
Edward Karavakis,
Wen Guan,
Zhaoyu Yang,
Tadashi Maeno,
Torre Wenaus,
Jennifer Adelman-McCarthy,
Fernando Barreiro Megino,
Kaushik De,
Richard Dubois,
Michelle Gower,
Tim Jenness,
Alexei Klimentov,
Tatiana Korchuganova,
Mikolaj Kowalik,
Fa-Hui Lin,
Paul Nilsson,
Sergey Padolski,
Wei Yang,
Shuwei Ye
Abstract:
The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored ev…
▽ More
The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored every night, and annual campaigns to reprocess the entire dataset since the beginning of the survey will be conducted over ten years. The Production and Distributed Analysis (PanDA) system was evaluated by the Rubin Observatory Data Management team and selected to serve the Observatory's needs due to its demonstrated scalability and flexibility over the years, for its Directed Acyclic Graph (DAG) support, its support for multi-site processing, and its highly scalable complex workflows via the intelligent Data Delivery Service (iDDS). PanDA is also being evaluated for prompt processing where data must be processed within 60 seconds after image capture. This paper will briefly describe the Rubin Data Management system and its Data Facilities (DFs). Finally, it will describe in depth the work performed in order to integrate the PanDA system with the Rubin Observatory to be able to run the Rubin Science Pipelines using PanDA.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Data management and execution systems for the Rubin Observatory Science Pipelines
Authors:
Nate B. Lust,
Tim Jenness,
James F. Bosch,
Andrei Salnikov,
Nathan M. Pease,
Michelle Gower,
Mikolaj Kowalik,
Gregory P. Dubois-Felsmann,
Fritz Mueller,
Pim Schellart
Abstract:
We present the Rubin Observatory system for data storage/retrieval and pipelined code execution. The layer for data storage and retrieval is named the Butler. It consists of a relational database, known as the registry, to keep track of metadata and relations, and a system to manage where the data is located, named the datastore. Together these systems create an abstraction layer that science algo…
▽ More
We present the Rubin Observatory system for data storage/retrieval and pipelined code execution. The layer for data storage and retrieval is named the Butler. It consists of a relational database, known as the registry, to keep track of metadata and relations, and a system to manage where the data is located, named the datastore. Together these systems create an abstraction layer that science algorithms can be written against. This abstraction layer manages the complexities of the large data volumes expected and allows algorithms to be written independently, yet be tied together automatically into a coherent processing pipeline. This system consists of tools which execute these pipelines by transforming them into execution graphs which contain concrete data stored in the Butler. The pipeline infrastructure is designed to be scalable in nature, allowing execution on environments ranging from a laptop all the way up to multi-facility data centers. This presentation will focus on the data management aspects as well as an overview on the creation of pipelines and the corresponding execution graphs.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Adding Workflow Management Flexibility to LSST Pipelines Execution
Authors:
Michelle Gower,
Mikolaj Kowalik,
Nate B. Lust,
James F. Bosch,
Tim Jenness
Abstract:
Data processing pipelines need to be executed at scales ranging from small runs up through large production data release runs resulting in millions of data products. As part of the Rubin Observatory's pipeline execution system, BPS is the abstraction layer that provides an interface to different Workflow Management Systems (WMS) such as HTCondor and PanDA. During the submission process, the pipeli…
▽ More
Data processing pipelines need to be executed at scales ranging from small runs up through large production data release runs resulting in millions of data products. As part of the Rubin Observatory's pipeline execution system, BPS is the abstraction layer that provides an interface to different Workflow Management Systems (WMS) such as HTCondor and PanDA. During the submission process, the pipeline execution system interacts with the Data Butler to produce a science-oriented execution graph from algorithmic tasks. BPS converts this execution graph to a workflow graph and then uses a WMS-specific plugin to submit and manage the workflow. Here we will discuss the architectural design of this interface and report briefly on the recent production of the Data Preview 0.2 release and how the system is used by pipeline developers.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
The Vera C. Rubin Observatory Data Butler and Pipeline Execution System
Authors:
Tim Jenness,
James F. Bosch,
Nate B. Lust,
Nathan M. Pease,
Michelle Gower,
Mikolaj Kowalik,
Gregory P. Dubois-Felsmann,
Fritz Mueller,
Pim Schellart
Abstract:
The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a lapto…
▽ More
The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a laptop using a local file system. The Butler and pipeline system are now in daily use during Rubin construction and early operations.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Rubin Science Platform on Google: the story so far
Authors:
William O'Mullane,
Frossie Economou,
Flora Huang,
Dan Speck,
Hsin-Fang Chiang,
Melissa L. Graham,
Russ Allbery,
Christine Banek,
Jonathan Sick,
Adam J. Thornton,
Jess Masciarelli,
Kian-Tat Lim,
Fritz Mueller,
Sergey Padolski,
Tim Jenness,
K. Simon Krughoff,
Michelle Gower,
Leanne P. Guy,
Gregory P. Dubois-Felsmann
Abstract:
We describe Rubin Observatory's experience with offering a data access facility (and associated services including our Science Platform) deployed on Google Cloud infrastructure as part of our pre-Operations Data Preview program.
We describe Rubin Observatory's experience with offering a data access facility (and associated services including our Science Platform) deployed on Google Cloud infrastructure as part of our pre-Operations Data Preview program.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Abstracting the storage and retrieval of image data at the LSST
Authors:
Tim Jenness,
James F. Bosch,
Pim Schellart,
Kian-Ta Lim,
Andrei Salnikov,
Michelle Gower
Abstract:
Writing generic data processing pipelines requires that the algorithmic code does not ever have to know about data formats of files, or the locations of those files. At LSST we have a software system known as "the Data Butler," that abstracts these details from the software developer. Scientists can specify the dataset they want in terms they understand, such as filter, observation identifier, dat…
▽ More
Writing generic data processing pipelines requires that the algorithmic code does not ever have to know about data formats of files, or the locations of those files. At LSST we have a software system known as "the Data Butler," that abstracts these details from the software developer. Scientists can specify the dataset they want in terms they understand, such as filter, observation identifier, date of observation, and instrument name, and the Butler translates that to one or more files which are read and returned to them as a single Python object. Conversely, once they have created a new dataset they can give it back to the Butler, with a label describing its new status, and the Butler can write it in whatever format it has been configured to use. All configuration is in YAML and supports standard defaults whilst allowing overrides.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
The Dark Energy Survey Data Release 1
Authors:
T. M. C. Abbott,
F. B. Abdalla,
S. Allam,
A. Amara,
J. Annis,
J. Asorey,
S. Avila,
O. Ballester,
M. Banerji,
W. Barkhouse,
L. Baruah,
M. Baumer,
K. Bechtol,
M . R. Becker,
A. Benoit-Lévy,
G. M. Bernstein,
E. Bertin,
J. Blazek,
S. Bocquet,
D. Brooks,
D. Brout,
E. Buckley-Geer,
D. L. Burke,
V. Busti,
R. Campisano
, et al. (177 additional authors not shown)
Abstract:
We describe the first public data release of the Dark Energy Survey, DES DR1, consisting of reduced single epoch images, coadded images, coadded source catalogs, and associated products and services assembled over the first three years of DES science operations. DES DR1 is based on optical/near-infrared imaging from 345 distinct nights (August 2013 to February 2016) by the Dark Energy Camera mount…
▽ More
We describe the first public data release of the Dark Energy Survey, DES DR1, consisting of reduced single epoch images, coadded images, coadded source catalogs, and associated products and services assembled over the first three years of DES science operations. DES DR1 is based on optical/near-infrared imaging from 345 distinct nights (August 2013 to February 2016) by the Dark Energy Camera mounted on the 4-m Blanco telescope at Cerro Tololo Inter-American Observatory in Chile. We release data from the DES wide-area survey covering ~5,000 sq. deg. of the southern Galactic cap in five broad photometric bands, grizY. DES DR1 has a median delivered point-spread function of g = 1.12, r = 0.96, i = 0.88, z = 0.84, and Y = 0.90 arcsec FWHM, a photometric precision of < 1% in all bands, and an astrometric precision of 151 mas. The median coadded catalog depth for a 1.95" diameter aperture at S/N = 10 is g = 24.33, r = 24.08, i = 23.44, z = 22.69, and Y = 21.44 mag. DES DR1 includes nearly 400M distinct astronomical objects detected in ~10,000 coadd tiles of size 0.534 sq. deg. produced from ~39,000 individual exposures. Benchmark galaxy and stellar samples contain ~310M and ~ 80M objects, respectively, following a basic object quality selection. These data are accessible through a range of interfaces, including query web clients, image cutout servers, jupyter notebooks, and an interactive coadd image visualization tool. DES DR1 constitutes the largest photometric data set to date at the achieved depth and photometric precision.
△ Less
Submitted 23 April, 2019; v1 submitted 9 January, 2018;
originally announced January 2018.
-
The Dark Energy Survey Image Processing Pipeline
Authors:
E. Morganson,
R. A. Gruendl,
F. Menanteau,
M. Carrasco Kind,
Y. -C. Chen,
G. Daues,
A. Drlica-Wagner,
D. N. Friedel,
M. Gower,
M. W. G. Johnson,
M. D. Johnson,
R. Kessler,
F. Paz-Chinchón,
D. Petravick,
C. Pond,
B. Yanny,
S. Allam,
R. Armstrong,
W. Barkhouse,
K. Bechtol,
A. Benoit-Lévy,
G. M. Bernstein,
E. Bertin,
E. Buckley-Geer,
R. Covarrubias
, et al. (18 additional authors not shown)
Abstract:
The Dark Energy Survey (DES) is a five-year optical imaging campaign with the goal of understanding the origin of cosmic acceleration. DES performs a 5000 square degree survey of the southern sky in five optical bands (g,r,i,z,Y) to a depth of ~24th magnitude. Contemporaneously, DES performs a deep, time-domain survey in four optical bands (g,r,i,z) over 27 square degrees. DES exposures are proces…
▽ More
The Dark Energy Survey (DES) is a five-year optical imaging campaign with the goal of understanding the origin of cosmic acceleration. DES performs a 5000 square degree survey of the southern sky in five optical bands (g,r,i,z,Y) to a depth of ~24th magnitude. Contemporaneously, DES performs a deep, time-domain survey in four optical bands (g,r,i,z) over 27 square degrees. DES exposures are processed nightly with an evolving data reduction pipeline and evaluated for image quality to determine if they need to be retaken. Difference imaging and transient source detection are also performed in the time domain component nightly. On a bi-annual basis, DES exposures are reprocessed with a refined pipeline and coadded to maximize imaging depth. Here we describe the DES image processing pipeline in support of DES science, as a reference for users of archival DES data, and as a guide for future astronomical surveys.
△ Less
Submitted 9 January, 2018;
originally announced January 2018.
-
The Dark Energy Survey Data Processing and Calibration System
Authors:
Joseph J. Mohr,
Robert Armstrong,
Emmanuel Bertin,
Gregory E. Daues,
Shantanu Desai,
Michelle Gower,
Robert Gruendl,
William Hanlon,
Nikolay Kuropatkin,
Huan Lin,
John Marriner,
Don Petravick,
Ignacio Sevilla,
Molly Swanson,
Todd Tomashek,
Douglas Tucker,
Brian Yanny,
the Dark Energy Survey Collaboration
Abstract:
The Dark Energy Survey (DES) is a 5000 deg2 grizY survey reaching characteristic photometric depths of 24th magnitude (10 sigma) and enabling accurate photometry and morphology of objects ten times fainter than in SDSS. Preparations for DES have included building a dedicated 3 deg2 CCD camera (DECam), upgrading the existing CTIO Blanco 4m telescope and developing a new high performance computing (…
▽ More
The Dark Energy Survey (DES) is a 5000 deg2 grizY survey reaching characteristic photometric depths of 24th magnitude (10 sigma) and enabling accurate photometry and morphology of objects ten times fainter than in SDSS. Preparations for DES have included building a dedicated 3 deg2 CCD camera (DECam), upgrading the existing CTIO Blanco 4m telescope and developing a new high performance computing (HPC) enabled data management system (DESDM).
The DESDM system will be used for processing, calibrating and serving the DES data. The total data volumes are high (~2PB), and so considerable effort has gone into designing an automated processing and quality control system. Special purpose image detrending and photometric calibration codes have been developed to meet the data quality requirements, while survey astrometric calibration, coaddition and cataloging rely on new extensions of the AstrOmatic codes which now include tools for PSF modeling, PSF homogenization, PSF corrected model fitting cataloging and joint model fitting across multiple input images.
The DESDM system has been deployed on dedicated development clusters and HPC systems in the US and Germany. An extensive program of testing with small rapid turn-around and larger campaign simulated datasets has been carried out. The system has also been tested on large real datasets, including Blanco Cosmology Survey data from the Mosaic2 camera. In Fall 2012 the DESDM system will be used for DECam commissioning, and, thereafter, the system will go into full science operations.
△ Less
Submitted 13 July, 2012;
originally announced July 2012.
-
The Dark Energy Survey Data Management System
Authors:
I. Sevilla,
R. Armstrong,
E. Bertin,
A. Carlson,
G. Daues,
S. Desai,
M. Gower,
R. Gruendl,
W. Hanlon,
M. Jarvis,
R. Kessler,
N. Kuropatkin,
H. Lin,
J. Marriner,
J. Mohr,
D. Petravick,
E. Sheldon,
M. E. C. Swanson,
T. Tomashek,
D. Tucker,
Y. Yang,
B. Yanny
Abstract:
The Dark Energy Survey (DES) is a project with the goal of building, installing and exploiting a new 74 CCD-camera at the Blanco telescope, in order to study the nature of cosmic acceleration. It will cover 5000 square degrees of the southern hemisphere sky and will record the positions and shapes of 300 million galaxies up to redshift 1.4. The survey will be completed using 525 nights during a 5-…
▽ More
The Dark Energy Survey (DES) is a project with the goal of building, installing and exploiting a new 74 CCD-camera at the Blanco telescope, in order to study the nature of cosmic acceleration. It will cover 5000 square degrees of the southern hemisphere sky and will record the positions and shapes of 300 million galaxies up to redshift 1.4. The survey will be completed using 525 nights during a 5-year period starting in 2012. About O(1 TB) of raw data will be produced every night, including science and calibration images. The DES data management system has been designed for the processing, calibration and archiving of these data. It is being developed by collaborating DES institutions, led by NCSA. In this contribution, we describe the basic functions of the system, what kind of scientific codes are involved and how the Data Challenge process works, to improve simultaneously the Data Management system algorithms and the Science Working Group analysis codes.
△ Less
Submitted 30 September, 2011;
originally announced September 2011.
-
The Dark Energy Survey Data Management System: The Processing Framework
Authors:
Michelle Gower,
Joseph J. Mohr,
Darren Adams,
Y. Dora Cai,
Gregory E. Daues,
Tony Darnell,
Chow-Choong Ngeow,
Shantanu Desai,
Cristina Beldica,
Mike Freemon,
Huan Lin,
Eric H. Neilsen,
Douglas Tucker,
Emmanuel Bertin,
Luiz A. Nicolaci da Costa,
Leandro Martelli,
Ricardo L. C. Ogando,
Michael Jarvis,
Erin Sheldon
Abstract:
The Dark Energy Survey Data Management (DESDM) system will process and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova im…
▽ More
The Dark Energy Survey Data Management (DESDM) system will process and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data.
△ Less
Submitted 5 May, 2009;
originally announced May 2009.
-
The Dark Energy Survey Data Management System
Authors:
Joseph J. Mohr,
Wayne Barkhouse,
Cristina Beldica,
Emmanuel Bertin,
Y. Dora Cai,
Luiz da Costa,
J. Anthony Darnell,
Gregory E. Daues,
Michael Jarvis,
Michelle Gower,
Huan Lin,
leandro Martelli,
Eric Neilsen,
Chow-Choong Ngeow,
Ricardo Ogando,
Alex Parga,
Erin Sheldon,
Douglas Tucker,
Nikolay Kuropatkin,
Chris Stoughton
Abstract:
The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a da…
▽ More
The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a data access framework. We are developing the DESDM system for operation in the high performance computing (HPC) environments at NCSA and Fermilab. Operating the DESDM system in an HPC environment offers both speed and flexibility. We will employ it for our regular nightly processing needs, and for more compute-intensive tasks such as large scale image coaddition campaigns, extraction of weak lensing shear from the full survey dataset, and massive seasonal reprocessing of the DES data. Data products will be available to the Collaboration and later to the public through a virtual-observatory compatible web portal. Our approach leverages investments in publicly available HPC systems, greatly reducing hardware and maintenance costs to the project, which must deploy and maintain only the storage, database platforms and orchestration and web portal nodes that are specific to DESDM. In Fall 2007, we tested the current DESDM system on both simulated and real survey data. We used Teragrid to process 10 simulated DES nights (3TB of raw data), ingesting and calibrating approximately 250 million objects into the DES Archive database. We also used DESDM to process and calibrate over 50 nights of survey data acquired with the Mosaic2 camera. Comparison to truth tables in the case of the simulated data and internal crosschecks in the case of the real data indicate that astrometric and photometric data quality is excellent.
△ Less
Submitted 16 July, 2008;
originally announced July 2008.