-
Optimisation of ATLAS computing resource usage through a modern HEP Benchmark Suite via HammerCloud and Big PanDA
Authors:
Natalia Szczepanek,
Domenico Giordano,
Ivan Glushkov,
Gonzalo Menendez Borge,
Alessandro Di Girolamo,
Alexander Lory,
Ilija Vukotic
Abstract:
In April 2023, HEPScore23, the new benchmark based on HEP specific applications, was adopted by WLCG, replacing HEP-SPEC06. As part of the transition to the new benchmark, the CPU corepower published by the sites needed to be compared with the effective power observed while running ATLAS workloads. One aim was to verify the conversion rate between the scores of the old and the new benchmark. The o…
▽ More
In April 2023, HEPScore23, the new benchmark based on HEP specific applications, was adopted by WLCG, replacing HEP-SPEC06. As part of the transition to the new benchmark, the CPU corepower published by the sites needed to be compared with the effective power observed while running ATLAS workloads. One aim was to verify the conversion rate between the scores of the old and the new benchmark. The other objective was to understand how the HEPScore performs when run on multi-core job slots, so exactly like the computing sites are being used in the production environment. Our study leverages the HammerCloud infrastructure and the PanDA Workload Management System to collect a large benchmark statistic across 136 computing sites using an enhanced HEP Benchmark Suite. It allows us to collect not only performance metrics, but, thanks to plugins, it also collects information such as machine load, memory usage and other user-defined metrics during the execution and stores it in an OpenSearch database. These extensive tests allow for an in-depth analysis of the actual, versus declared computing capabilities of these sites. The results provide valuable insights into the real-world performance of computing resources pledged to ATLAS, identifying areas for improvement while spotlighting sites that underperform or exceed expectations. Moreover, this helps to ensure efficient operational practices across sites. The collected metrics allowed us to detect and fix configuration issues and therefore improve the experienced performance.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
An Analysis of Word2Vec for the Italian Language
Authors:
Giovanni Di Gennaro,
Amedeo Buonanno,
Antonio Di Girolamo,
Armando Ospedale,
Francesco A. N. Palmieri,
Gianfranco Fedele
Abstract:
Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text. Despite the spread of word embedding concepts, still few are the achievements in linguistic contexts other than English. In this work, analysing the semantic capacity of the Word2Vec algorithm, an embeddi…
▽ More
Word representation is fundamental in NLP tasks, because it is precisely from the coding of semantic closeness between words that it is possible to think of teaching a machine to understand text. Despite the spread of word embedding concepts, still few are the achievements in linguistic contexts other than English. In this work, analysing the semantic capacity of the Word2Vec algorithm, an embedding for the Italian language is produced. Parameter setting such as the number of epochs, the size of the context window and the number of negatively backpropagated samples is explored.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Intent Classification in Question-Answering Using LSTM Architectures
Authors:
Giovanni Di Gennaro,
Amedeo Buonanno,
Antonio Di Girolamo,
Armando Ospedale,
Francesco A. N. Palmieri
Abstract:
Question-answering (QA) is certainly the best known and probably also one of the most complex problem within Natural Language Processing (NLP) and artificial intelligence (AI). Since the complete solution to the problem of finding a generic answer still seems far away, the wisest thing to do is to break down the problem by solving single simpler parts. Assuming a modular approach to the problem, w…
▽ More
Question-answering (QA) is certainly the best known and probably also one of the most complex problem within Natural Language Processing (NLP) and artificial intelligence (AI). Since the complete solution to the problem of finding a generic answer still seems far away, the wisest thing to do is to break down the problem by solving single simpler parts. Assuming a modular approach to the problem, we confine our research to intent classification for an answer, given a question. Through the use of an LSTM network, we show how this type of classification can be approached effectively and efficiently, and how it can be properly used within a basic prototype responder.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Rucio - Scientific data management
Authors:
Martin Barisits,
Thomas Beermann,
Frank Berghaus,
Brian Bockelman,
Joaquin Bogado,
David Cameron,
Dimitrios Christidis,
Diego Ciangottini,
Gancho Dimitrov,
Markus Elsing,
Vincent Garonne,
Alessandro di Girolamo,
Luc Goossens,
Wen Guan,
Jaroslav Guenther,
Tomas Javurek,
Dietmar Kuhn,
Mario Lassnig,
Fernando Lopez,
Nicolo Magini,
Angelos Molfetas,
Armin Nairz,
Farid Ould-Saada,
Stefan Prenner,
Cedric Serfon
, et al. (5 additional authors not shown)
Abstract:
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support t…
▽ More
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support the LHC experiments and other diverse scientific communities. In this article, we detail the fundamental concepts of Rucio, describe the architecture along with implementation details, and give operational experience from production usage.
△ Less
Submitted 6 June, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Federating distributed storage for clouds in ATLAS
Authors:
Frank Berghaus,
Kevin Casteels,
Alessandro Di Girolamo,
Colson Driemel,
Marcus Ebert,
Fabrizio Furano,
Fernado Galindo,
Mario Lassnig,
Colin Leavett-Brown,
Michael Paterson,
Cedric Serfon,
Rolf Seuster,
Randall Sobie,
Reda Tafirout,
Ryan Paul Taylor
Abstract:
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently…
▽ More
Input data for applications that run in cloud computing centres can be stored at distant repositories, often with multiple copies of the popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. A federation would locate the closest copy of the data on the basis of GeoIP information. Currently we are using the dynamic data federation Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standards for connection protocols like Amazon's S3, Microsoft's Azure, as well as WebDAV and HTTP. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have setup an instance of Dynafed and integrated it into the ATLAS data distribution management system. We report on the challenges faced during the installation and integration. We have tested ATLAS analysis jobs submitted by the PanDA production system and we report on our first experiences with its operation.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Automating ATLAS Computing Operations using the Site Status Board
Authors:
Julia Andreeva,
Carlos Borrego Iglesias,
Simone Campana,
Alessandro Di Girolamo,
Ivan Dzhunov,
Xavier Espinal Curull,
Stavro Gayazov,
Erekle Magradze,
Michal Maciej Nowotka,
Lorenzo Rinaldi,
Pablo Saiz,
Jaroslava Schovancova,
Graeme Andrew Stewart,
Michael Wright
Abstract:
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and…
▽ More
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.
△ Less
Submitted 28 January, 2013; v1 submitted 1 January, 2013;
originally announced January 2013.