Search | arXiv e-print repository

HEP Software Foundation Community White Paper Working Group -- Data Organization, Management and Access (DOMA)

Authors: Dario Berzano, Riccardo Maria Bianchi, Ian Bird, Brian Bockelman, Simone Campana, Kaushik De, Dirk Duellmann, Peter Elmer, Robert Gardner, Vincent Garonne, Claudio Grandi, Oliver Gutsche, Andrew Hanushevsky, Burt Holzman, Bodhitha Jayatilaka, Ivo Jimenez, Michel Jouvin, Oliver Keeble, Alexei Klimentov, Valentin Kuznetsov, Eric Lancon, Mario Lassnig, Miron Livny, Carlos Maltzahn, Shawn McKee , et al. (13 additional authors not shown)

Abstract: Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess th… ▽ More Without significant changes to data organization, management, and access (DOMA), HEP experiments will find scientific output limited by how fast data can be accessed and digested by computational resources. In this white paper we discuss challenges in DOMA that HEP experiments, such as the HL-LHC, will face as well as potential ways to address them. A research and development timeline to assess these changes is also proposed. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: arXiv admin note: text overlap with arXiv:1712.06592

Report number: HSF-CWP-2017-04

arXiv:1807.02875 [pdf, ps, other]

HEP Software Foundation Community White Paper Working Group - Training, Staffing and Careers

Authors: HEP Software Foundation, :, Dario Berzano, Riccardo Maria Bianchi, Peter Elmer, Sergei V. Gleyzer John Harvey, Roger Jones, Michel Jouvin, Daniel S. Katz, Sudhir Malik, Dario Menasce, Mark Neubauer, Fernanda Psihas, Albert Puig Navarro, Graeme A. Stewart, Christopher Tunnell, Justin A. Vasel, Sean-Jiun Wang

Abstract: The rapid evolution of technology and the parallel increasing complexity of algorithmic analysis in HEP requires developers to acquire a much larger portfolio of programming skills. Young researchers graduating from universities worldwide currently do not receive adequate preparation in the very diverse fields of modern computing to respond to growing needs of the most advanced experimental challe… ▽ More The rapid evolution of technology and the parallel increasing complexity of algorithmic analysis in HEP requires developers to acquire a much larger portfolio of programming skills. Young researchers graduating from universities worldwide currently do not receive adequate preparation in the very diverse fields of modern computing to respond to growing needs of the most advanced experimental challenges. There is a growing consensus in the HEP community on the need for training programmes to bring researchers up to date with new software technologies, in particular in the domains of concurrent programming and artificial intelligence. We review some of the initiatives under way for introducing new training programmes and highlight some of the issues that need to be taken into account for these to be successful. △ Less

Submitted 17 January, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

Report number: HSF-CWP-2017-02

arXiv:1407.3063 [pdf, ps, other]

The Need for a Versioned Data Analysis Software Environment

Authors: Jakob Blomer, Dario Berzano, Predrag Buncic, Ioannis Charalampidis, Gerardo Ganis, George Lestaris, René Meusel

Abstract: Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, i… ▽ More Scientific results in high-energy physics and in many other fields often rely on complex software stacks. In order to support reproducibility and scrutiny of the results, it is good practice to use open source software and to cite software packages and versions. With ever-growing complexity of scientific software on one side and with IT life-cycles of only a few years on the other side, however, it turns out that despite source code availability the setup and the validation of a minimal usable analysis environment can easily become prohibitively expensive. We argue that there is a substantial gap between merely having access to versioned source code and the ability to create a data analysis runtime environment. In order to preserve all the different variants of the data analysis runtime environment, we developed a snapshotting file system optimized for software distribution. We report on our experience in preserving the analysis environment for high-energy physics such as the software landscape used to discover the Higgs boson at the Large Hadron Collider. △ Less

Submitted 11 July, 2014; originally announced July 2014.

arXiv:1404.1814 [pdf, other]

doi 10.1088/1742-6596/513/3/032055

CernVM Online and Cloud Gateway: a uniform interface for CernVM contextualization and deployment

Authors: G. Lestaris, I. Charalampidis, D. Berzano, J. Blomer, P. Buncic, G. Ganis, R. Meusel

Abstract: In a virtualized environment, contextualization is the process of configuring a VM instance for the needs of various deployment use cases. Contextualization in CernVM can be done by passing a handwritten context to the user data field of cloud APIs, when running CernVM on the cloud, or by using CernVM web interface when running the VM locally. CernVM Online is a publicly accessible web interface t… ▽ More In a virtualized environment, contextualization is the process of configuring a VM instance for the needs of various deployment use cases. Contextualization in CernVM can be done by passing a handwritten context to the user data field of cloud APIs, when running CernVM on the cloud, or by using CernVM web interface when running the VM locally. CernVM Online is a publicly accessible web interface that unifies these two procedures. A user is able to define, store and share CernVM contexts using CernVM Online and then apply them either in a cloud by using CernVM Cloud Gateway or on a local VM with the single-step pairing mechanism. CernVM Cloud Gateway is a distributed system that provides a single interface to use multiple and different clouds (by location or type, private or public). Cloud gateway has been so far integrated with OpenNebula, CloudStack and EC2 tools interfaces. A user, with access to a number of clouds, can run CernVM cloud agents that will communicate with these clouds using their interfaces, and then use one single interface to deploy and scale CernVM clusters. CernVM clusters are defined in CernVM Online and consist of a set of CernVM instances that are contextualized and can communicate with each other. △ Less

Submitted 7 April, 2014; originally announced April 2014.

Comments: Conference paper at the 2013 Computing in High Energy Physics (CHEP) Conference, Amsterdam

arXiv:1402.4623 [pdf]

doi 10.1088/1742-6596/513/3/032007

PROOF as a Service on the Cloud: a Virtual Analysis Facility based on the CernVM ecosystem

Authors: Dario Berzano, Jakob Blomer, Predrag Buncic, Ioannis Charalampidis, Gerardo Ganis, Georgios Lestaris, René Meusel

Abstract: PROOF, the Parallel ROOT Facility, is a ROOT-based framework which enables interactive parallelism for event-based tasks on a cluster of computing nodes. Although PROOF can be used simply from within a ROOT session with no additional requirements, deploying and configuring a PROOF cluster used to be not as straightforward. Recently great efforts have been spent to make the provisioning of generic… ▽ More PROOF, the Parallel ROOT Facility, is a ROOT-based framework which enables interactive parallelism for event-based tasks on a cluster of computing nodes. Although PROOF can be used simply from within a ROOT session with no additional requirements, deploying and configuring a PROOF cluster used to be not as straightforward. Recently great efforts have been spent to make the provisioning of generic PROOF analysis facilities with zero configuration, with the added advantages of positively affecting both stability and scalability, making the deployment operations feasible even for the end user. Since a growing amount of large-scale computing resources are nowadays made available by Cloud providers in a virtualized form, we have developed the Virtual PROOF-based Analysis Facility: a cluster appliance combining the solid CernVM ecosystem and PoD (PROOF on Demand), ready to be deployed on the Cloud and leveraging some peculiar Cloud features such as elasticity. We will show how this approach is effective both for sysadmins, who will have little or no configuration to do to run it on their Clouds, and for the end users, who are ultimately in full control of their PROOF cluster and can even easily restart it by themselves in the unfortunate event of a major failure. We will also show how elasticity leads to a more optimal and uniform usage of Cloud resources. △ Less

Submitted 19 February, 2014; originally announced February 2014.

Comments: Talk from Computing in High Energy and Nuclear Physics 2013 (CHEP2013), Amsterdam (NL), October 2013, 7 pages, 4 figures

arXiv:1311.2426 [pdf, other]

doi 10.1088/1742-6596/513/3/032009

Micro-CernVM: Slashing the Cost of Building and Deploying Virtual Machines

Authors: J. Blomer, D. Berzano, P. Buncic, I. Charalampidis, G. Ganis, G. Lestaris, R. Meusel, V. Nicolaou

Abstract: The traditional virtual machine building and and deployment process is centered around the virtual machine hard disk image. The packages comprising the VM operating system are carefully selected, hard disk images are built for a variety of different hypervisors, and images have to be distributed and decompressed in order to instantiate a virtual machine. Within the HEP community, the CernVM File S… ▽ More The traditional virtual machine building and and deployment process is centered around the virtual machine hard disk image. The packages comprising the VM operating system are carefully selected, hard disk images are built for a variety of different hypervisors, and images have to be distributed and decompressed in order to instantiate a virtual machine. Within the HEP community, the CernVM File System has been established in order to decouple the distribution from the experiment software from the building and distribution of the VM hard disk images. We show how to get rid of such pre-built hard disk images altogether. Due to the high requirements on POSIX compliance imposed by HEP application software, CernVM-FS can also be used to host and boot a Linux operating system. This allows the use of a tiny bootable CD image that comprises only a Linux kernel while the rest of the operating system is provided on demand by CernVM-FS. This approach speeds up the initial instantiation time and reduces virtual machine image sizes by an order of magnitude. Furthermore, security updates can be distributed instantaneously through CernVM-FS. By leveraging the fact that CernVM-FS is a versioning file system, a historic analysis environment can be easily re-spawned by selecting the corresponding CernVM-FS file system snapshot. △ Less

Submitted 11 November, 2013; originally announced November 2013.

Comments: Conference paper at the 2013 Computing in High Energy Physics (CHEP) Conference, Amsterdam

arXiv:1203.3641 [pdf, ps, other]

doi 10.1016/j.physletb.2012.10.078

Inclusive J/psi production in pp collisions at sqrt(s) = 2.76 TeV

Authors: ALICE Collaboration, B. Abelev, J. Adam, D. Adamova, A. M. Adare, M. M. Aggarwal, G. Aglieri Rinella, A. G. Agocs, A. Agostinelli, S. Aguilar Salazar, Z. Ahammed, A. Ahmad Masoodi, N. Ahmad, S. U. Ahn, A. Akindinov, D. Aleksandrov, B. Alessandro, R. Alfaro Molina, A. Alici, A. Alkin, E. Almaraz Avina, J. Alme, T. Alt, V. Altini, S. Altinpinar , et al. (948 additional authors not shown)

Abstract: The ALICE Collaboration has measured inclusive J/psi production in pp collisions at a center of mass energy sqrt(s)=2.76 TeV at the LHC. The results presented in this Letter refer to the rapidity ranges |y|<0.9 and 2.5<y<4 and have been obtained by measuring the electron and muon pair decay channels, respectively. The integrated luminosities for the two channels are L^e_int=1.1 nb^-1 and L^mu_int=… ▽ More The ALICE Collaboration has measured inclusive J/psi production in pp collisions at a center of mass energy sqrt(s)=2.76 TeV at the LHC. The results presented in this Letter refer to the rapidity ranges |y|<0.9 and 2.5<y<4 and have been obtained by measuring the electron and muon pair decay channels, respectively. The integrated luminosities for the two channels are L^e_int=1.1 nb^-1 and L^mu_int=19.9 nb^-1, and the corresponding signal statistics are N_J/psi^e+e-=59 +/- 14 and N_J/psi^mu+mu-=1364 +/- 53. We present dsigma_J/psi/dy for the two rapidity regions under study and, for the forward-y range, d^2sigma_J/psi/dydp_t in the transverse momentum domain 0<p_t<8 GeV/c. The results are compared with previously published results at sqrt(s)=7 TeV and with theoretical calculations. △ Less

Submitted 6 November, 2012; v1 submitted 16 March, 2012; originally announced March 2012.

Comments: 7 figures, 3 tables, accepted for publication in Phys. Lett. B

Report number: CERN-PH-EP-2012-055

Journal ref: Phys.Lett.B 718 (2012) 295-306, Phys.Lett.B 748 (2015) 472-473 (erratum)

Showing 1–7 of 7 results for author: Berzano, D