-
PyHEP.dev 2024 Workshop Summary Report, August 26-30 2024, Aachen, Germany
Authors:
Azzah Alshehri,
Jan Bürger,
Saransh Chopra,
Niclas Eich,
Jonas Eppelt,
Martin Erdmann,
Jonas Eschle,
Peter Fackeldey,
Maté Farkas,
Matthew Feickert,
Tristan Fillinger,
Benjamin Fischer,
Stefan Fröse,
Lino Oscar Gerlach,
Nikolai Hartmann,
Alexander Heidelbach,
Alexander Held,
Marian I Ivanov,
Josué Molina,
Yaroslav Nikitenko,
Ianna Osborne,
Vincenzo Eduardo Padulano,
Jim Pivarski,
Cyrille Praz,
Marcel Rieger
, et al. (6 additional authors not shown)
Abstract:
The second PyHEP.dev workshop, part of the "Python in HEP Developers" series organized by the HEP Software Foundation (HSF), took place in Aachen, Germany, from August 26 to 30, 2024. This gathering brought together nearly 30 Python package developers, maintainers, and power users to engage in informal discussions about current trends in Python, with a primary focus on analysis tools and technique…
▽ More
The second PyHEP.dev workshop, part of the "Python in HEP Developers" series organized by the HEP Software Foundation (HSF), took place in Aachen, Germany, from August 26 to 30, 2024. This gathering brought together nearly 30 Python package developers, maintainers, and power users to engage in informal discussions about current trends in Python, with a primary focus on analysis tools and techniques in High Energy Physics (HEP).
The workshop agenda encompassed a range of topics, such as defining the scope of HEP data analysis, exploring the Analysis Grand Challenge project, evaluating statistical models and serialization methods, assessing workflow management systems, examining histogramming practices, and investigating distributed processing tools like RDataFrame, Coffea, and Dask. Additionally, the workshop dedicated time to brainstorming the organization of future PyHEP.dev events, upholding the tradition of alternating between Europe and the United States as host locations.
This document, prepared by the session conveners in the weeks following the workshop, serves as a summary of the key discussions, salient points, and conclusions that emerged.
△ Less
Submitted 17 December, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
First performance measurements with the Analysis Grand Challenge
Authors:
Oksana Shadura,
Alexander Held
Abstract:
The IRIS-HEP Analysis Grand Challenge (AGC) is designed to be a realistic environment for investigating how analysis methods scale to the demands of the HL-LHC. The analysis task is based on publicly available Open Data and allows for comparing the usability and performance of different approaches and implementations. It includes all relevant workflow aspects from data delivery to statistical infe…
▽ More
The IRIS-HEP Analysis Grand Challenge (AGC) is designed to be a realistic environment for investigating how analysis methods scale to the demands of the HL-LHC. The analysis task is based on publicly available Open Data and allows for comparing the usability and performance of different approaches and implementations. It includes all relevant workflow aspects from data delivery to statistical inference.
The reference implementation for the AGC analysis task is heavily based on tools from the HEP Python ecosystem. It makes use of novel pieces of cyberinfrastructure and modern analysis facilities in order to address the data processing challenges of the HL-LHC.
This contribution compares multiple different analysis implementations and studies their performance. Differences between the implementations include the use of multiple data delivery mechanisms and caching setups for the analysis facilities under investigation.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access
Authors:
W. Bhimji,
D. Carder,
E. Dart,
J. Duarte,
I. Fisk,
R. Gardner,
C. Guok,
B. Jayatilaka,
T. Lehman,
M. Lin,
C. Maltzahn,
S. McKee,
M. S. Neubauer,
O. Rind,
O. Shadura,
N. V. Tran,
P. van Gemmeren,
G. Watts,
B. A. Weaver,
F. Würthwein
Abstract:
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer…
▽ More
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commercial clouds, federally funded High Performance Computing (HPC) systems for all of science, and systems funded explicitly for a given experimental or theoretical program. This topical group report summarizes the findings and recommendations for the storage, processing, networking and associated software service infrastructures for future high energy physics research, based on the discussions organized through the Snowmass 2021 community study.
△ Less
Submitted 29 September, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Collaborative Computing Support for Analysis Facilities Exploiting Software as Infrastructure Techniques
Authors:
Maria Acosta Flechas,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Lindsey Gray,
Burt Holzman,
Carl Lundstedt,
Oksana Shadura,
Nicholas Smith,
John Thiltges
Abstract:
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked sof…
▽ More
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked software, it is possible for multiple institutes to collaborate more effectively since resource details are abstracted away through various forms of hardware and software virtualization. In this whitepaper we will outline the development of two analysis facilities: "Coffea-casa" at University of Nebraska Lincoln and the "Elastic Analysis Facility" at Fermilab, and how utilizing platform abstraction has improved the development of common software for each of these facilities, and future development plans made possible by this methodology.
△ Less
Submitted 22 March, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
HL-LHC Computing Review Stage 2, Common Software Projects: Data Science Tools for Analysis
Authors:
Jim Pivarski,
Eduardo Rodrigues,
Kevin Pedro,
Oksana Shadura,
Benjamin Krikler,
Graeme A. Stewart
Abstract:
This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community.
This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Software Training in HEP
Authors:
Sudhir Malik,
Samuel Meehan,
Kilian Lieret,
Meirin Oan Evans,
Michel H. Villanueva,
Daniel S. Katz,
Graeme A. Stewart,
Peter Elmer,
Sizar Aziz,
Matthew Bellis,
Riccardo Maria Bianchi,
Gianluca Bianco,
Johan Sebastian Bonilla,
Angela Burger,
Jackson Burzynski,
David Chamont,
Matthew Feickert,
Philipp Gadow,
Bernhard Manfred Gruber,
Daniel Guest,
Stephan Hageboeck,
Lukas Heinrich,
Maximilian M. Horzela,
Marc Huwiler,
Clemens Lange
, et al. (22 additional authors not shown)
Abstract:
Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required softw…
▽ More
Long term sustainability of the high energy physics (HEP) research software ecosystem is essential for the field. With upgrades and new facilities coming online throughout the 2020s this will only become increasingly relevant throughout this decade. Meeting this sustainability challenge requires a workforce with a combination of HEP domain knowledge and advanced software skills. The required software skills fall into three broad groups. The first is fundamental and generic software engineering (e.g. Unix, version control,C++, continuous integration). The second is knowledge of domain specific HEP packages and practices (e.g., the ROOT data format and analysis framework). The third is more advanced knowledge involving more specialized techniques. These include parallel programming, machine learning and data science tools, and techniques to preserve software projects at all scales. This paper dis-cusses the collective software training program in HEP and its activities led by the HEP Software Foundation (HSF) and the Institute for Research and Innovation in Software in HEP (IRIS-HEP). The program equips participants with an array of software skills that serve as ingredients from which solutions to the computing challenges of HEP can be formed. Beyond serving the community by ensuring that members are able to pursue research goals, this program serves individuals by providing intellectual capital and transferable skills that are becoming increasingly important to careers in the realm of software and computing, whether inside or outside HEP
△ Less
Submitted 6 August, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Authors:
G. Amadio,
A. Ananya,
J. Apostolakis,
M. Bandieramonte,
S. Banerjee,
A. Bhattacharyya,
C. Bianchini,
G. Bitzes,
P. Canal,
F. Carminati,
O. Chaparro-Amaro,
G. Cosmo,
J. C. De Fine Licht,
V. Drogan,
L. Duhem,
D. Elvira,
J. Fuentes,
A. Gheata,
M. Gheata,
M. Gravey,
I. Goulas,
F. Hariri,
S. Y. Jun,
D. Konstantinov,
H. Kumawat
, et al. (17 additional authors not shown)
Abstract:
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases,…
▽ More
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.
△ Less
Submitted 16 September, 2020; v1 submitted 2 May, 2020;
originally announced May 2020.
-
Software Challenges For HL-LHC Data Analysis
Authors:
ROOT Team,
Kim Albertsson Brann,
Guilherme Amadio,
Sitong An,
Bertrand Bellenot,
Jakob Blomer,
Philippe Canal,
Olivier Couet,
Massimiliano Galli,
Enrico Guiraud,
Stephan Hageboeck,
Sergey Linev,
Pere Mato Vila,
Lorenzo Moneta,
Axel Naumann,
Alja Mrak Tadel,
Vincenzo Eduardo Padulano,
Fons Rademakers,
Oksana Shadura,
Matevz Tadel,
Enric Tejedor Saavedra,
Xavier Valls Pla,
Vassil Vassilev,
Stefan Wunsch
Abstract:
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data ana…
▽ More
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations.
△ Less
Submitted 4 May, 2020; v1 submitted 16 April, 2020;
originally announced April 2020.
-
SND@LHC
Authors:
SHiP Collaboration,
C. Ahdida,
A. Akmete,
R. Albanese,
A. Alexandrov,
M. Andreini,
A. Anokhina,
S. Aoki,
G. Arduini,
E. Atkin,
N. Azorskiy,
J. J. Back,
A. Bagulya,
F. Baaltasar Dos Santos,
A. Baranov,
F. Bardou,
G. J. Barker,
M. Battistin,
J. Bauche,
A. Bay,
V. Bayliss,
G. Bencivenni,
A. Y. Berdnikov,
Y. A. Berdnikov,
M. Bertani
, et al. (319 additional authors not shown)
Abstract:
We propose to build and operate a detector that, for the first time, will measure the process $pp\toνX$ at the LHC and search for feebly interacting particles (FIPs) in an unexplored domain. The TI18 tunnel has been identified as a suitable site to perform these measurements due to very low machine-induced background. The detector will be off-axis with respect to the ATLAS interaction point (IP1)…
▽ More
We propose to build and operate a detector that, for the first time, will measure the process $pp\toνX$ at the LHC and search for feebly interacting particles (FIPs) in an unexplored domain. The TI18 tunnel has been identified as a suitable site to perform these measurements due to very low machine-induced background. The detector will be off-axis with respect to the ATLAS interaction point (IP1) and, given the pseudo-rapidity range accessible, the corresponding neutrinos will mostly come from charm decays: the proposed experiment will thus make the first test of the heavy flavour production in a pseudo-rapidity range that is not accessible by the current LHC detectors. In order to efficiently reconstruct neutrino interactions and identify their flavour, the detector will combine in the target region nuclear emulsion technology with scintillating fibre tracking layers and it will adopt a muon identification system based on scintillating bars that will also play the role of a hadronic calorimeter. The time of flight measurement will be achieved thanks to a dedicated timing detector. The detector will be a small-scale prototype of the scattering and neutrino detector (SND) of the SHiP experiment: the operation of this detector will provide an important test of the neutrino reconstruction in a high occupancy environment.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.