Search | arXiv e-print repository

Julia in HEP

Authors: Graeme Andrew Stewart, Alexander Moreno Briceño, Philippe Gras, Benedikt Hegner, Uwe Hernandez Acosta, Tamas Gal, Jerry Ling, Pere Mato, Mikhail Mikhasenko, Oliver Schulz, Sam Skipsey

Abstract: Julia is a mature general-purpose programming language, with a large ecosystem of libraries and more than 12000 third-party packages, which specifically targets scientific computing. As a language, Julia is as dynamic, interactive, and accessible as Python with NumPy, but achieves run-time performance on par with C/C++. In this paper, we describe the state of adoption of Julia in HEP, where moment… ▽ More Julia is a mature general-purpose programming language, with a large ecosystem of libraries and more than 12000 third-party packages, which specifically targets scientific computing. As a language, Julia is as dynamic, interactive, and accessible as Python with NumPy, but achieves run-time performance on par with C/C++. In this paper, we describe the state of adoption of Julia in HEP, where momentum has been gathering over a number of years. HEP-oriented Julia packages can already, via UnROOT.jl, read HEP's major file formats, including TTree and RNTuple. Interfaces to some of HEP's major software packages, such as through Geant4.jl, are available too. Jet reconstruction algorithms in Julia show excellent performance. A number of full HEP analyses have been performed in Julia. We show how, as the support for HEP has matured, developments have benefited from Julia's core design choices, which makes reuse from and integration with other packages easy. In particular, libraries developed outside HEP for plotting, statistics, fitting, and scientific machine learning are extremely useful. We believe that the powerful combination of flexibility and speed, the wide selection of scientific programming tools, and support for all modern programming paradigms and tools, make Julia the ideal choice for a future language in HEP. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:1712.06982 [pdf, other]

doi 10.1007/s41781-018-0018-8

A Roadmap for HEP Software and Computing R&D for the 2020s

Authors: Johannes Albrecht, Antonio Augusto Alves Jr, Guilherme Amadio, Giuseppe Andronico, Nguyen Anh-Ky, Laurent Aphecetche, John Apostolakis, Makoto Asai, Luca Atzori, Marian Babik, Giuseppe Bagliesi, Marilena Bandieramonte, Sunanda Banerjee, Martin Barisits, Lothar A. T. Bauerdick, Stefano Belforte, Douglas Benjamin, Catrin Bernius, Wahid Bhimji, Riccardo Maria Bianchi, Ian Bird, Catherine Biscarat, Jakob Blomer, Kenneth Bloom, Tommaso Boccali , et al. (285 additional authors not shown)

Abstract: Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for… ▽ More Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade. △ Less

Submitted 19 December, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

Report number: HSF-CWP-2017-01

Journal ref: Comput Softw Big Sci (2019) 3, 7

arXiv:1512.00272 [pdf, other]

doi 10.1088/1742-6596/664/4/042052

Enabling Object Storage via shims for Grid Middleware

Authors: Samuel Cadellin Skipsey, Shaun De Witt, Alastair Dewhurst, David Britton, Gareth Roy, David Crooks

Abstract: The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requi… ▽ More The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for "POSIX-like" filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP\cite{GridPP} hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular {\it Ceph} parallel distributed filesystem for the GFAL2 access layer, at Glasgow. △ Less

Submitted 30 October, 2015; originally announced December 2015.

Comments: 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

arXiv:1510.09117 [pdf, other]

doi 10.1088/1742-6596/664/4/042051

Extending DIRAC File Management with Erasure-Coding for efficient storage

Authors: Samuel Cadellin Skipsey, Paulin Todev, David Britton, David Crooks, Gareth Roy

Abstract: The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We… ▽ More The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We report on work performed as part of GridPP\cite{GridPP}, extending the Dirac File Catalogue and file management interface to allow the placement of erasure-coded files: each file distributed as N identically-sized chunks of data striped across a vector of storage endpoints, encoded such that any M chunks can be lost and the original file can be reconstructed. The tools developed are transparent to the user, and, as well as allowing up and downloading of data to Grid storage, also provide the possibility of parallelising access across all of the distributed chunks at once, improving data transfer and IO performance. We expect this approach to be of most interest to smaller VOs, who have tighter bounds on the storage available to them, but larger (WLCG) VOs may be interested as their total data increases during Run 2. We provide an analysis of the costs and benefits of the approach, along with future development and implementation plans in this area. In general, overheads for multiple file transfers provide the largest issue for competitiveness of this approach at present. △ Less

Submitted 30 October, 2015; originally announced October 2015.

Comments: 21st International Conference on Computing for High Energy and Nuclear Physics (CHEP2015)

arXiv:1102.3114 [pdf, ps, other]

doi 10.1088/1742-6596/331/5/052019

Establishing Applicability of SSDs to LHC Tier-2 Hardware Configuration

Authors: Samuel C Skipsey, Wahid Bhimji, Mike Kenyon

Abstract: Solid State Disk technologies are increasingly replacing high-speed hard disks as the storage technology in high-random-I/O environments. There are several potentially I/O bound services within the typical LHC Tier-2 - in the back-end, with the trend towards many-core architectures continuing, worker nodes running many single-threaded jobs and storage nodes delivering many simultaneous files can b… ▽ More Solid State Disk technologies are increasingly replacing high-speed hard disks as the storage technology in high-random-I/O environments. There are several potentially I/O bound services within the typical LHC Tier-2 - in the back-end, with the trend towards many-core architectures continuing, worker nodes running many single-threaded jobs and storage nodes delivering many simultaneous files can both exhibit I/O limited efficiency. We estimate the effectiveness of affordable SSDs in the context of worker nodes, on a large Tier-2 production setup using both low level tools and real LHC I/O intensive data analysis jobs comparing and contrasting with high performance spinning disk based solutions. We consider the applicability of each solution in the context of its price/performance metrics, with an eye on the pragmatic issues facing Tier-2 provision and upgrades △ Less

Submitted 15 February, 2011; originally announced February 2011.

Comments: 6 pages, 1 figure, 4 tables. Conference proceedings for CHEP2010

arXiv:0910.4510 [pdf, ps, other]

doi 10.1088/1742-6596/219/6/062066

Optimised access to user analysis data using the gLite DPM

Authors: Sam Skipsey, Greig Cowan, Mike Kenyon, Stuart Purdie, Graeme Stewart

Abstract: The ScotGrid distributed Tier-2 now provides more that 4MSI2K and 500TB for LHC computing, which is spread across three sites at Durham, Edinburgh and Glasgow. Tier-2 sites have a dual role to play in the computing models of the LHC VOs. Firstly, their CPU resources are used for the generation of Monte Carlo event data. Secondly, the end user analysis data is distributed across the grid to the s… ▽ More The ScotGrid distributed Tier-2 now provides more that 4MSI2K and 500TB for LHC computing, which is spread across three sites at Durham, Edinburgh and Glasgow. Tier-2 sites have a dual role to play in the computing models of the LHC VOs. Firstly, their CPU resources are used for the generation of Monte Carlo event data. Secondly, the end user analysis data is distributed across the grid to the site's storage system and held on disk ready for processing by physicists' analysis jobs. In this paper we show how we have designed the ScotGrid storage and data management resources in order to optimise access by physicists to LHC data. Within ScotGrid, all sites use the gLite DPM storage manager middleware. Using the EGEE grid to submit real ATLAS analysis code to process VO data stored on the ScotGrid sites, we present an analysis of the performance of the architecture at one site, and procedures that may be undertaken to improve such. The results will be presented from the point of view of the end user (in terms of number of events processed/second) and from the point of view of the site, which wishes to minimise load and the impact that analysis activity has on other users of the system. △ Less

Submitted 23 October, 2009; originally announced October 2009.

Comments: 8 pages, 9 figures, preprint for 17th International Conference on Computing in High Energy and Nuclear Physics

Journal ref: J.Phys.Conf.Ser.219:062066,2010

arXiv:0910.4507 [pdf, ps, other]

doi 10.1088/1742-6596/219/5/052014

ScotGrid: Providing an Effective Distributed Tier-2 in the LHC Era

Authors: Sam Skipsey, David Ambrose-Griffith, Greig Cowan, Mike Kenyon, Orlando Richards, Phil Roffe, Graeme Stewart

Abstract: ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham, Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC VOs. Scaling up to this level of provision has brought many challenges to the Tier-2 and we show in this paper how we have adopted new methods of organising the centres,… ▽ More ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham, Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC VOs. Scaling up to this level of provision has brought many challenges to the Tier-2 and we show in this paper how we have adopted new methods of organising the centres, from fabric management and monitoring to remote management of sites to management and operational procedures, to meet these challenges. We describe how we have coped with different operational models at the sites, where Glagsow and Durham sites are managed "in house" but resources at Edinburgh are managed as a central university resource. This required the adoption of a different fabric management model at Edinburgh and a special engagement with the cluster managers. Challenges arose from the different job models of local and grid submission that required special attention to resolve. We show how ScotGrid has successfully provided an infrastructure for ATLAS and LHCb Monte Carlo production. Special attention has been paid to ensuring that user analysis functions efficiently, which has required optimisation of local storage and networking to cope with the demands of user analysis. Finally, although these Tier-2 resources are pledged to the whole VO, we have established close links with our local physics user communities as being the best way to ensure that the Tier-2 functions effectively as a part of the LHC grid computing framework.. △ Less

Submitted 23 October, 2009; originally announced October 2009.

Comments: Preprint for 17th International Conference on Computing in High Energy and Nuclear Physics, 7 pages, 1 figure

Report number: GLAS-PPE/2009-07

Journal ref: J.Phys.Conf.Ser.219:052014,2010

Showing 1–7 of 7 results for author: Skipsey, S