-
A Ceph S3 Object Data Store for HEP
Authors:
Nick Smith,
Bo Jayatilaka,
David Mason,
Oliver Gutsche,
Alison Peisker,
Robert Illingworth,
Chris Jones
Abstract:
We present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize metadata volume and maximize data processing parallelism. Performance benchmarks of data storage and retrieval are presented.
We present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize metadata volume and maximize data processing parallelism. Performance benchmarks of data storage and retrieval are presented.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access
Authors:
W. Bhimji,
D. Carder,
E. Dart,
J. Duarte,
I. Fisk,
R. Gardner,
C. Guok,
B. Jayatilaka,
T. Lehman,
M. Lin,
C. Maltzahn,
S. McKee,
M. S. Neubauer,
O. Rind,
O. Shadura,
N. V. Tran,
P. van Gemmeren,
G. Watts,
B. A. Weaver,
F. Würthwein
Abstract:
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer…
▽ More
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commercial clouds, federally funded High Performance Computing (HPC) systems for all of science, and systems funded explicitly for a given experimental or theoretical program. This topical group report summarizes the findings and recommendations for the storage, processing, networking and associated software service infrastructures for future high energy physics research, based on the discussions organized through the Snowmass 2021 community study.
△ Less
Submitted 29 September, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Learning from the Pandemic: the Future of Meetings in HEP and Beyond
Authors:
Mark S. Neubauer,
Todd Adams,
Jennifer Adelman-McCarthy,
Gabriele Benelli,
Tulika Bose,
David Britton,
Pat Burchat,
Joel Butler,
Timothy A. Cartwright,
Tomáš Davídek,
Jacques Dumarchez,
Peter Elmer,
Matthew Feickert,
Ben Galewsky,
Mandeep Gill,
Maciej Gladki,
Aman Goel,
Jonathan E. Guyer,
Bo Jayatilaka,
Brendan Kiburg,
Benjamin Krikler,
David Lange,
Claire Lee,
Nick Manganelli,
Giovanni Marchiori
, et al. (14 additional authors not shown)
Abstract:
The COVID-19 pandemic has by-and-large prevented in-person meetings since March 2020. While the increasing deployment of effective vaccines around the world is a very positive development, the timeline and pathway to "normality" is uncertain and the "new normal" we will settle into is anyone's guess. Particle physics, like many other scientific fields, has more than a year of experience in holding…
▽ More
The COVID-19 pandemic has by-and-large prevented in-person meetings since March 2020. While the increasing deployment of effective vaccines around the world is a very positive development, the timeline and pathway to "normality" is uncertain and the "new normal" we will settle into is anyone's guess. Particle physics, like many other scientific fields, has more than a year of experience in holding virtual meetings, workshops, and conferences. A great deal of experimentation and innovation to explore how to execute these meetings effectively has occurred. Therefore, it is an appropriate time to take stock of what we as a community learned from running virtual meetings and discuss possible strategies for the future. Continuing to develop effective strategies for meetings with a virtual component is likely to be important for reducing the carbon footprint of our research activities, while also enabling greater diversity and inclusion for participation. This report summarizes a virtual two-day workshop on Virtual Meetings held May 5-6, 2021 which brought together experts from both inside and outside of high-energy physics to share their experiences and practices with organizing and executing virtual workshops, and to develop possible strategies for future meetings as we begin to emerge from the COVID-19 pandemic. This report outlines some of the practices and tools that have worked well which we hope will serve as a valuable resource for future virtual meeting organizers in all scientific fields.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
HEP Software Foundation Community White Paper Working Group -- Data Organization, Management and Access (DOMA)
Authors:
Dario Berzano,
Riccardo Maria Bianchi,
Ian Bird,
Brian Bockelman,
Simone Campana,
Kaushik De,
Dirk Duellmann,
Peter Elmer,
Robert Gardner,
Vincent Garonne,
Claudio Grandi,
Oliver Gutsche,
Andrew Hanushevsky,
Burt Holzman,
Bodhitha Jayatilaka,
Ivo Jimenez,
Michel Jouvin,
Oliver Keeble,
Alexei Klimentov,
Valentin Kuznetsov,
Eric Lancon,
Mario Lassnig,
Miron Livny,
Carlos Maltzahn,
Shawn McKee
, et al. (13 additional authors not shown)
Abstract:
Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess th…
▽ More
Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess these changes is also proposed.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
Authors:
Lothar Bauerdick,
Riccardo Maria Bianchi,
Brian Bockelman,
Nuno Castro,
Kyle Cranmer,
Peter Elmer,
Robert Gardner,
Maria Girone,
Oliver Gutsche,
Benedikt Hegner,
José M. Hernández,
Bodhitha Jayatilaka,
David Lange,
Mark S. Neubauer,
Daniel S. Katz,
Lukasz Kreczko,
James Letts,
Shawn McKee,
Christoph Paus,
Kevin Pedro,
Jim Pivarski,
Martin Ritter,
Eduardo Rodrigues,
Tai Sakuma,
Elizabeth Sexton-Kennedy
, et al. (4 additional authors not shown)
Abstract:
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific po…
▽ More
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
A Roadmap for HEP Software and Computing R&D for the 2020s
Authors:
Johannes Albrecht,
Antonio Augusto Alves Jr,
Guilherme Amadio,
Giuseppe Andronico,
Nguyen Anh-Ky,
Laurent Aphecetche,
John Apostolakis,
Makoto Asai,
Luca Atzori,
Marian Babik,
Giuseppe Bagliesi,
Marilena Bandieramonte,
Sunanda Banerjee,
Martin Barisits,
Lothar A. T. Bauerdick,
Stefano Belforte,
Douglas Benjamin,
Catrin Bernius,
Wahid Bhimji,
Riccardo Maria Bianchi,
Ian Bird,
Catherine Biscarat,
Jakob Blomer,
Kenneth Bloom,
Tommaso Boccali
, et al. (285 additional authors not shown)
Abstract:
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for…
▽ More
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
△ Less
Submitted 19 December, 2018; v1 submitted 18 December, 2017;
originally announced December 2017.
-
Data preservation at the Fermilab Tevatron
Authors:
S. Amerio,
S. Behari,
J. Boyd,
M. Brochmann,
R. Culbertson,
M. Diesburg,
J. Freeman,
L. Garren,
H. Greenlee,
K. Herner,
R. Illingworth,
B. Jayatilaka,
A. Jonckheere,
Q. Li,
S. Naymola,
G. Oleynik,
W. Sakumotob,
E. Varnes,
C. Vellidis,
G. Watts,
S. White
Abstract:
The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab.…
▽ More
The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards in both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. These efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.
△ Less
Submitted 26 January, 2017;
originally announced January 2017.
-
Data processing model for the CDF experiment
Authors:
J. Antos,
M. Babik,
D. Benjamin,
S. Cabrera,
A. W. Chan,
Y. C. Chen,
M. Coca,
B. Cooper,
S. Farrington,
K. Genser,
K. Hatakeyama,
S. Hou,
T. L. Hsieh,
B. Jayatilaka,
S. Y. Jun,
A. V. Kotwal,
A. C. Kraan,
R. Lysak,
I. V. Mandrichenko,
P. Murat,
A. Robson,
P. Savard,
M. Siket,
B. Stelzer,
J. Syu
, et al. (5 additional authors not shown)
Abstract:
The data processing model for the CDF experiment is described. Data processing reconstructs events from parallel data streams taken with different combinations of physics event triggers and further splits the events into datasets of specialized physics datasets. The design of the processing control system faces strict requirements on bookkeeping records, which trace the status of data files and…
▽ More
The data processing model for the CDF experiment is described. Data processing reconstructs events from parallel data streams taken with different combinations of physics event triggers and further splits the events into datasets of specialized physics datasets. The design of the processing control system faces strict requirements on bookkeeping records, which trace the status of data files and event contents during processing and storage. The computing architecture was updated to meet the mass data flow of the Run II data collection, recently upgraded to a maximum rate of 40 MByte/sec. The data processing facility consists of a large cluster of Linux computers with data movement managed by the CDF data handling system to a multi-petaByte Enstore tape library. The latest processing cycle has achieved a stable speed of 35 MByte/sec (3 TByte/day). It can be readily scaled by increasing CPU and data-handling capacity as required.
△ Less
Submitted 9 June, 2006; v1 submitted 5 June, 2006;
originally announced June 2006.
-
Data production models for the CDF experiment
Authors:
J. Antos,
M. Babik,
D. Benjamin,
S. Cabrera,
A. W. Chan,
Y. C. Chen,
M. Coca,
B. Cooper,
K. Genser,
K. Hatakeyama,
S. Hou,
T. L. Hsieh,
B. Jayatilaka,
A. C. Kraan,
R. Lysak,
I. V. Mandrichenko,
A. Robson,
M. Siket,
B. Stelzer,
J. Syu,
P. K. Teng,
S. C. Timm,
T. Tomura,
E. Vataga,
S. A. Wolbers
, et al. (1 additional authors not shown)
Abstract:
The data production for the CDF experiment is conducted on a large Linux PC farm designed to meet the needs of data collection at a maximum rate of 40 MByte/sec. We present two data production models that exploits advances in computing and communication technology. The first production farm is a centralized system that has achieved a stable data processing rate of approximately 2 TByte per day.…
▽ More
The data production for the CDF experiment is conducted on a large Linux PC farm designed to meet the needs of data collection at a maximum rate of 40 MByte/sec. We present two data production models that exploits advances in computing and communication technology. The first production farm is a centralized system that has achieved a stable data processing rate of approximately 2 TByte per day. The recently upgraded farm is migrated to the SAM (Sequential Access to data via Metadata) data handling system. The software and hardware of the CDF production farms has been successful in providing large computing and data throughput capacity to the experiment.
△ Less
Submitted 5 June, 2006;
originally announced June 2006.