Search | arXiv e-print repository

Integrating the PanDA Workload Management System with the Vera C. Rubin Observatory

Authors: Edward Karavakis, Wen Guan, Zhaoyu Yang, Tadashi Maeno, Torre Wenaus, Jennifer Adelman-McCarthy, Fernando Barreiro Megino, Kaushik De, Richard Dubois, Michelle Gower, Tim Jenness, Alexei Klimentov, Tatiana Korchuganova, Mikolaj Kowalik, Fa-Hui Lin, Paul Nilsson, Sergey Padolski, Wei Yang, Shuwei Ye

Abstract: The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored ev… ▽ More The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored every night, and annual campaigns to reprocess the entire dataset since the beginning of the survey will be conducted over ten years. The Production and Distributed Analysis (PanDA) system was evaluated by the Rubin Observatory Data Management team and selected to serve the Observatory's needs due to its demonstrated scalability and flexibility over the years, for its Directed Acyclic Graph (DAG) support, its support for multi-site processing, and its highly scalable complex workflows via the intelligent Data Delivery Service (iDDS). PanDA is also being evaluated for prompt processing where data must be processed within 60 seconds after image capture. This paper will briefly describe the Rubin Data Management system and its Data Facilities (DFs). Finally, it will describe in depth the work performed in order to integrate the PanDA system with the Rubin Observatory to be able to run the Rubin Science Pipelines using PanDA. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 8 pages, 3 figures, 26th International Conference on Computing in High Energy & Nuclear Physics

arXiv:2303.03313 [pdf, other]

Data management and execution systems for the Rubin Observatory Science Pipelines

Authors: Nate B. Lust, Tim Jenness, James F. Bosch, Andrei Salnikov, Nathan M. Pease, Michelle Gower, Mikolaj Kowalik, Gregory P. Dubois-Felsmann, Fritz Mueller, Pim Schellart

Abstract: We present the Rubin Observatory system for data storage/retrieval and pipelined code execution. The layer for data storage and retrieval is named the Butler. It consists of a relational database, known as the registry, to keep track of metadata and relations, and a system to manage where the data is located, named the datastore. Together these systems create an abstraction layer that science algo… ▽ More We present the Rubin Observatory system for data storage/retrieval and pipelined code execution. The layer for data storage and retrieval is named the Butler. It consists of a relational database, known as the registry, to keep track of metadata and relations, and a system to manage where the data is located, named the datastore. Together these systems create an abstraction layer that science algorithms can be written against. This abstraction layer manages the complexities of the large data volumes expected and allows algorithms to be written independently, yet be tied together automatically into a coherent processing pipeline. This system consists of tools which execute these pipelines by transforming them into execution graphs which contain concrete data stored in the Butler. The pipeline infrastructure is designed to be scalable in nature, allowing execution on environments ranging from a laptop all the way up to multi-facility data centers. This presentation will focus on the data management aspects as well as an overview on the creation of pipelines and the corresponding execution graphs. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 4 pages, submitted to Astronomical Data Analysis Software and Systems XXXII, October 2022

arXiv:2211.15795 [pdf, other]

Adding Workflow Management Flexibility to LSST Pipelines Execution

Authors: Michelle Gower, Mikolaj Kowalik, Nate B. Lust, James F. Bosch, Tim Jenness

Abstract: Data processing pipelines need to be executed at scales ranging from small runs up through large production data release runs resulting in millions of data products. As part of the Rubin Observatory's pipeline execution system, BPS is the abstraction layer that provides an interface to different Workflow Management Systems (WMS) such as HTCondor and PanDA. During the submission process, the pipeli… ▽ More Data processing pipelines need to be executed at scales ranging from small runs up through large production data release runs resulting in millions of data products. As part of the Rubin Observatory's pipeline execution system, BPS is the abstraction layer that provides an interface to different Workflow Management Systems (WMS) such as HTCondor and PanDA. During the submission process, the pipeline execution system interacts with the Data Butler to produce a science-oriented execution graph from algorithmic tasks. BPS converts this execution graph to a workflow graph and then uses a WMS-specific plugin to submit and manage the workflow. Here we will discuss the architectural design of this interface and report briefly on the recent production of the Data Preview 0.2 release and how the system is used by pipeline developers. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 4 pages, submitted to Astronomical Data Analysis Software and Systems XXXII, October 2022

arXiv:2206.14941 [pdf, other]

The Vera C. Rubin Observatory Data Butler and Pipeline Execution System

Authors: Tim Jenness, James F. Bosch, Nate B. Lust, Nathan M. Pease, Michelle Gower, Mikolaj Kowalik, Gregory P. Dubois-Felsmann, Fritz Mueller, Pim Schellart

Abstract: The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a lapto… ▽ More The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a laptop using a local file system. The Butler and pipeline system are now in daily use during Rubin construction and early operations. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 14 pages, 3 figures, submitted to Proc SPIE 12189, "Software and Cyberinfrastructure for Astronomy VII", Montreal, CA, July 2022

arXiv:2111.15030 [pdf, ps, other]

Rubin Science Platform on Google: the story so far

Authors: William O'Mullane, Frossie Economou, Flora Huang, Dan Speck, Hsin-Fang Chiang, Melissa L. Graham, Russ Allbery, Christine Banek, Jonathan Sick, Adam J. Thornton, Jess Masciarelli, Kian-Tat Lim, Fritz Mueller, Sergey Padolski, Tim Jenness, K. Simon Krughoff, Michelle Gower, Leanne P. Guy, Gregory P. Dubois-Felsmann

Abstract: We describe Rubin Observatory's experience with offering a data access facility (and associated services including our Science Platform) deployed on Google Cloud infrastructure as part of our pre-Operations Data Preview program. We describe Rubin Observatory's experience with offering a data access facility (and associated services including our Science Platform) deployed on Google Cloud infrastructure as part of our pre-Operations Data Preview program. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 4 pages 1 figure

Report number: DMTN-209

Journal ref: Proceedings of ADASS XXXI 2021

arXiv:1812.08085 [pdf, other]

Abstracting the storage and retrieval of image data at the LSST

Authors: Tim Jenness, James F. Bosch, Pim Schellart, Kian-Ta Lim, Andrei Salnikov, Michelle Gower

Abstract: Writing generic data processing pipelines requires that the algorithmic code does not ever have to know about data formats of files, or the locations of those files. At LSST we have a software system known as "the Data Butler," that abstracts these details from the software developer. Scientists can specify the dataset they want in terms they understand, such as filter, observation identifier, dat… ▽ More Writing generic data processing pipelines requires that the algorithmic code does not ever have to know about data formats of files, or the locations of those files. At LSST we have a software system known as "the Data Butler," that abstracts these details from the software developer. Scientists can specify the dataset they want in terms they understand, such as filter, observation identifier, date of observation, and instrument name, and the Butler translates that to one or more files which are read and returned to them as a single Python object. Conversely, once they have created a new dataset they can give it back to the Butler, with a label describing its new status, and the Butler can write it in whatever format it has been configured to use. All configuration is in YAML and supports standard defaults whilst allowing overrides. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: 4 pages, 1 figure, submitted to proceedings of ADASS XXVIII to be published in ASP Conf. Series

arXiv:1801.03181 [pdf, other]

doi 10.3847/1538-4365/aae9f0

The Dark Energy Survey Data Release 1

Authors: T. M. C. Abbott, F. B. Abdalla, S. Allam, A. Amara, J. Annis, J. Asorey, S. Avila, O. Ballester, M. Banerji, W. Barkhouse, L. Baruah, M. Baumer, K. Bechtol, M . R. Becker, A. Benoit-Lévy, G. M. Bernstein, E. Bertin, J. Blazek, S. Bocquet, D. Brooks, D. Brout, E. Buckley-Geer, D. L. Burke, V. Busti, R. Campisano , et al. (177 additional authors not shown)

Abstract: We describe the first public data release of the Dark Energy Survey, DES DR1, consisting of reduced single epoch images, coadded images, coadded source catalogs, and associated products and services assembled over the first three years of DES science operations. DES DR1 is based on optical/near-infrared imaging from 345 distinct nights (August 2013 to February 2016) by the Dark Energy Camera mount… ▽ More We describe the first public data release of the Dark Energy Survey, DES DR1, consisting of reduced single epoch images, coadded images, coadded source catalogs, and associated products and services assembled over the first three years of DES science operations. DES DR1 is based on optical/near-infrared imaging from 345 distinct nights (August 2013 to February 2016) by the Dark Energy Camera mounted on the 4-m Blanco telescope at Cerro Tololo Inter-American Observatory in Chile. We release data from the DES wide-area survey covering ~5,000 sq. deg. of the southern Galactic cap in five broad photometric bands, grizY. DES DR1 has a median delivered point-spread function of g = 1.12, r = 0.96, i = 0.88, z = 0.84, and Y = 0.90 arcsec FWHM, a photometric precision of < 1% in all bands, and an astrometric precision of 151 mas. The median coadded catalog depth for a 1.95" diameter aperture at S/N = 10 is g = 24.33, r = 24.08, i = 23.44, z = 22.69, and Y = 21.44 mag. DES DR1 includes nearly 400M distinct astronomical objects detected in ~10,000 coadd tiles of size 0.534 sq. deg. produced from ~39,000 individual exposures. Benchmark galaxy and stellar samples contain ~310M and ~ 80M objects, respectively, following a basic object quality selection. These data are accessible through a range of interfaces, including query web clients, image cutout servers, jupyter notebooks, and an interactive coadd image visualization tool. DES DR1 constitutes the largest photometric data set to date at the achieved depth and photometric precision. △ Less

Submitted 23 April, 2019; v1 submitted 9 January, 2018; originally announced January 2018.

Comments: 30 pages, 20 Figures. Release page found at this url https://des.ncsa.illinois.edu/releases/dr1

Report number: FERMILAB-PUB-17-603-AE-E

arXiv:1801.03177 [pdf, other]

doi 10.1088/1538-3873/aab4ef

The Dark Energy Survey Image Processing Pipeline

Authors: E. Morganson, R. A. Gruendl, F. Menanteau, M. Carrasco Kind, Y. -C. Chen, G. Daues, A. Drlica-Wagner, D. N. Friedel, M. Gower, M. W. G. Johnson, M. D. Johnson, R. Kessler, F. Paz-Chinchón, D. Petravick, C. Pond, B. Yanny, S. Allam, R. Armstrong, W. Barkhouse, K. Bechtol, A. Benoit-Lévy, G. M. Bernstein, E. Bertin, E. Buckley-Geer, R. Covarrubias , et al. (18 additional authors not shown)

Abstract: The Dark Energy Survey (DES) is a five-year optical imaging campaign with the goal of understanding the origin of cosmic acceleration. DES performs a 5000 square degree survey of the southern sky in five optical bands (g,r,i,z,Y) to a depth of ~24th magnitude. Contemporaneously, DES performs a deep, time-domain survey in four optical bands (g,r,i,z) over 27 square degrees. DES exposures are proces… ▽ More The Dark Energy Survey (DES) is a five-year optical imaging campaign with the goal of understanding the origin of cosmic acceleration. DES performs a 5000 square degree survey of the southern sky in five optical bands (g,r,i,z,Y) to a depth of ~24th magnitude. Contemporaneously, DES performs a deep, time-domain survey in four optical bands (g,r,i,z) over 27 square degrees. DES exposures are processed nightly with an evolving data reduction pipeline and evaluated for image quality to determine if they need to be retaken. Difference imaging and transient source detection are also performed in the time domain component nightly. On a bi-annual basis, DES exposures are reprocessed with a refined pipeline and coadded to maximize imaging depth. Here we describe the DES image processing pipeline in support of DES science, as a reference for users of archival DES data, and as a guide for future astronomical surveys. △ Less

Submitted 9 January, 2018; originally announced January 2018.

arXiv:1207.3189 [pdf]

doi 10.1117/12.926785

The Dark Energy Survey Data Processing and Calibration System

Authors: Joseph J. Mohr, Robert Armstrong, Emmanuel Bertin, Gregory E. Daues, Shantanu Desai, Michelle Gower, Robert Gruendl, William Hanlon, Nikolay Kuropatkin, Huan Lin, John Marriner, Don Petravick, Ignacio Sevilla, Molly Swanson, Todd Tomashek, Douglas Tucker, Brian Yanny, the Dark Energy Survey Collaboration

Abstract: The Dark Energy Survey (DES) is a 5000 deg2 grizY survey reaching characteristic photometric depths of 24th magnitude (10 sigma) and enabling accurate photometry and morphology of objects ten times fainter than in SDSS. Preparations for DES have included building a dedicated 3 deg2 CCD camera (DECam), upgrading the existing CTIO Blanco 4m telescope and developing a new high performance computing (… ▽ More The Dark Energy Survey (DES) is a 5000 deg2 grizY survey reaching characteristic photometric depths of 24th magnitude (10 sigma) and enabling accurate photometry and morphology of objects ten times fainter than in SDSS. Preparations for DES have included building a dedicated 3 deg2 CCD camera (DECam), upgrading the existing CTIO Blanco 4m telescope and developing a new high performance computing (HPC) enabled data management system (DESDM). The DESDM system will be used for processing, calibrating and serving the DES data. The total data volumes are high (~2PB), and so considerable effort has gone into designing an automated processing and quality control system. Special purpose image detrending and photometric calibration codes have been developed to meet the data quality requirements, while survey astrometric calibration, coaddition and cataloging rely on new extensions of the AstrOmatic codes which now include tools for PSF modeling, PSF homogenization, PSF corrected model fitting cataloging and joint model fitting across multiple input images. The DESDM system has been deployed on dedicated development clusters and HPC systems in the US and Germany. An extensive program of testing with small rapid turn-around and larger campaign simulated datasets has been carried out. The system has also been tested on large real datasets, including Blanco Cosmology Survey data from the Mosaic2 camera. In Fall 2012 the DESDM system will be used for DECam commissioning, and, thereafter, the system will go into full science operations. △ Less

Submitted 13 July, 2012; originally announced July 2012.

Comments: 12 pages, submitted for publication in SPIE Proceeding 8451-12

arXiv:1109.6741 [pdf, ps, other]

The Dark Energy Survey Data Management System

Authors: I. Sevilla, R. Armstrong, E. Bertin, A. Carlson, G. Daues, S. Desai, M. Gower, R. Gruendl, W. Hanlon, M. Jarvis, R. Kessler, N. Kuropatkin, H. Lin, J. Marriner, J. Mohr, D. Petravick, E. Sheldon, M. E. C. Swanson, T. Tomashek, D. Tucker, Y. Yang, B. Yanny

Abstract: The Dark Energy Survey (DES) is a project with the goal of building, installing and exploiting a new 74 CCD-camera at the Blanco telescope, in order to study the nature of cosmic acceleration. It will cover 5000 square degrees of the southern hemisphere sky and will record the positions and shapes of 300 million galaxies up to redshift 1.4. The survey will be completed using 525 nights during a 5-… ▽ More The Dark Energy Survey (DES) is a project with the goal of building, installing and exploiting a new 74 CCD-camera at the Blanco telescope, in order to study the nature of cosmic acceleration. It will cover 5000 square degrees of the southern hemisphere sky and will record the positions and shapes of 300 million galaxies up to redshift 1.4. The survey will be completed using 525 nights during a 5-year period starting in 2012. About O(1 TB) of raw data will be produced every night, including science and calibration images. The DES data management system has been designed for the processing, calibration and archiving of these data. It is being developed by collaborating DES institutions, led by NCSA. In this contribution, we describe the basic functions of the system, what kind of scientific codes are involved and how the Data Challenge process works, to improve simultaneously the Data Management system algorithms and the Science Working Group analysis codes. △ Less

Submitted 30 September, 2011; originally announced September 2011.

Comments: 8 pages, 5 figures, to be published electronically as part of the Proceedings of the APS-DPF 2011 Conference, Providence, RI, August 8-13, 2011

arXiv:0905.0659 [pdf, ps, other]

The Dark Energy Survey Data Management System: The Processing Framework

Authors: Michelle Gower, Joseph J. Mohr, Darren Adams, Y. Dora Cai, Gregory E. Daues, Tony Darnell, Chow-Choong Ngeow, Shantanu Desai, Cristina Beldica, Mike Freemon, Huan Lin, Eric H. Neilsen, Douglas Tucker, Emmanuel Bertin, Luiz A. Nicolaci da Costa, Leandro Martelli, Ricardo L. C. Ogando, Michael Jarvis, Erin Sheldon

Abstract: The Dark Energy Survey Data Management (DESDM) system will process and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova im… ▽ More The Dark Energy Survey Data Management (DESDM) system will process and archive the data from the Dark Energy Survey (DES) over the five year period of operation. This paper focuses on a new adaptable processing framework developed to perform highly automated, high performance data parallel processing. The new processing framework has been used to process 45 nights of simulated DECam supernova imaging data, and was extensively used in the DES Data Challenge 4, where it was used to process thousands of square degrees of simulated DES data. △ Less

Submitted 5 May, 2009; originally announced May 2009.

Comments: 4 pages, 1 figure; submit to ADASS XVII

arXiv:0807.2515 [pdf]

doi 10.1117/12.789550

The Dark Energy Survey Data Management System

Authors: Joseph J. Mohr, Wayne Barkhouse, Cristina Beldica, Emmanuel Bertin, Y. Dora Cai, Luiz da Costa, J. Anthony Darnell, Gregory E. Daues, Michael Jarvis, Michelle Gower, Huan Lin, leandro Martelli, Eric Neilsen, Chow-Choong Ngeow, Ricardo Ogando, Alex Parga, Erin Sheldon, Douglas Tucker, Nikolay Kuropatkin, Chris Stoughton

Abstract: The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a da… ▽ More The Dark Energy Survey collaboration will study cosmic acceleration with a 5000 deg2 griZY survey in the southern sky over 525 nights from 2011-2016. The DES data management (DESDM) system will be used to process and archive these data and the resulting science ready data products. The DESDM system consists of an integrated archive, a processing framework, an ensemble of astronomy codes and a data access framework. We are developing the DESDM system for operation in the high performance computing (HPC) environments at NCSA and Fermilab. Operating the DESDM system in an HPC environment offers both speed and flexibility. We will employ it for our regular nightly processing needs, and for more compute-intensive tasks such as large scale image coaddition campaigns, extraction of weak lensing shear from the full survey dataset, and massive seasonal reprocessing of the DES data. Data products will be available to the Collaboration and later to the public through a virtual-observatory compatible web portal. Our approach leverages investments in publicly available HPC systems, greatly reducing hardware and maintenance costs to the project, which must deploy and maintain only the storage, database platforms and orchestration and web portal nodes that are specific to DESDM. In Fall 2007, we tested the current DESDM system on both simulated and real survey data. We used Teragrid to process 10 simulated DES nights (3TB of raw data), ingesting and calibrating approximately 250 million objects into the DES Archive database. We also used DESDM to process and calibrate over 50 nights of survey data acquired with the Mosaic2 camera. Comparison to truth tables in the case of the simulated data and internal crosschecks in the case of the real data indicate that astrometric and photometric data quality is excellent. △ Less

Submitted 16 July, 2008; originally announced July 2008.

Comments: To be published in the proceedings of the SPIE conference on Astronomical Instrumentation (held in Marseille in June 2008). This preprint is made available with the permission of SPIE. Further information together with preprint containing full quality images is available at http://desweb.cosmology.uiuc.edu/wiki

Showing 1–12 of 12 results for author: Gower, M