-
The 200 Gbps Challenge: Imagining HL-LHC analysis facilities
Authors:
Alexander Held,
Sam Albin,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Lincoln Bryant,
Kyungeon Choi,
Kyle Cranmer,
Peter Elmer,
Matthew Feickert,
Rob Gardner,
Lindsey Gray,
Fengping Hu,
David Lange,
Carl Lundstedt,
Peter Onyisi,
Jim Pivarski,
Oksana Shadura,
Nick Smith,
John Thiltges,
Ben Tovar,
Ilija Vukotic,
Gordon Watts,
Derek Weitzel,
Andrew Wightman
Abstract:
The IRIS-HEP software institute, as a contributor to the broader HEP Python ecosystem, is developing scalable analysis infrastructure and software tools to address the upcoming HL-LHC computing challenges with new approaches and paradigms, driven by our vision of what HL-LHC analysis will require. The institute uses a "Grand Challenge" format, constructing a series of increasingly large, complex,…
▽ More
The IRIS-HEP software institute, as a contributor to the broader HEP Python ecosystem, is developing scalable analysis infrastructure and software tools to address the upcoming HL-LHC computing challenges with new approaches and paradigms, driven by our vision of what HL-LHC analysis will require. The institute uses a "Grand Challenge" format, constructing a series of increasingly large, complex, and realistic exercises to show the vision of HL-LHC analysis. Recently, the focus has been demonstrating the IRIS-HEP analysis infrastructure at scale and evaluating technology readiness for production.
As a part of the Analysis Grand Challenge activities, the institute executed a "200 Gbps Challenge", aiming to show sustained data rates into the event processing of multiple analysis pipelines. The challenge integrated teams internal and external to the institute, including operations and facilities, analysis software tools, innovative data delivery and management services, and scalable analysis infrastructure. The challenge showcases the prototypes - including software, services, and facilities - built to process around 200 TB of data in both the CMS NanoAOD and ATLAS PHYSLITE data formats with test pipelines.
The teams were able to sustain the 200 Gbps target across multiple pipelines. The pipelines focusing on event rate were able to process at over 30 MHz. These target rates are demanding; the activity revealed considerations for future testing at this scale and changes necessary for physicists to work at this scale in the future. The 200 Gbps Challenge has established a baseline on today's facilities, setting the stage for the next exercise at twice the scale.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Tuning the CMS Coffea-casa facility for 200 Gbps Challenge
Authors:
Sam Albin,
Garhan Attebury,
Kenneth Bloom,
Brian Paul Bockelman,
Benjamin Tovar Lopez,
Carl Lundstedt,
Oksana Shadura,
John Thiltges,
Derek Weitzel,
Andrew Wightman
Abstract:
As a part of the IRIS-HEP "Analysis Grand Challenge" activities, the Coffea-casa AF team executed a "200 Gbps Challenge". One of the goals of this challenge was to provide a setup for execution of a test notebook-style analysis on the facility that could process a 200 TB CMS NanoAOD dataset in 20 minutes.
We describe the solutions we deployed at the facility to execute the challenge tasks. The f…
▽ More
As a part of the IRIS-HEP "Analysis Grand Challenge" activities, the Coffea-casa AF team executed a "200 Gbps Challenge". One of the goals of this challenge was to provide a setup for execution of a test notebook-style analysis on the facility that could process a 200 TB CMS NanoAOD dataset in 20 minutes.
We describe the solutions we deployed at the facility to execute the challenge tasks. The facility was configured to provide 2000+ cores for quick turn-around, low-latency analysis. To reach the highest event processing rates we tested different scaling backends, both scaling over HTCondor and Kubernetes resources and using Dask and Taskvine schedulers. This configuration also allowed us to compare two different services for managing Dask clusters, Dask labextention, and Dask Gateway server, under extreme conditions.
A robust set of XCache servers with a redirector were deployed in Kubernetes to cache the dataset to minimize wide-area network traffic. The XCache servers were backed with solid-state NVME drives deployed within the Kubernetes cluster nodes. All data access was authenticated using scitokens and was transparent to the user. To ensure we could track and measure data throughput precisely, we used our existing Prometheus monitoring stack to monitor the XCache pod throughput on the Kubernetes network layer. Using the rate query across all of the 8 XCache pods we were able to view a stacked cumulative graph of the total throughput for each XCache. This monitoring setup allowed us to ensure uniform data rates across all nodes while verifying we had reached the 200 Gbps benchmark.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Coffea-Casa: Building composable analysis facilities for the HL-LHC
Authors:
Sam Albin,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Carl Lundstedt,
Oksana Shadura,
John Thiltges
Abstract:
The large data volumes expected from the High Luminosity LHC (HL-LHC) present challenges to existing paradigms and facilities for end-user data analysis. Modern cyberinfrastructure tools provide a diverse set of services that can be composed into a system that provides physicists with powerful tools that give them straightforward access to large computing resources, with low barriers to entry. The…
▽ More
The large data volumes expected from the High Luminosity LHC (HL-LHC) present challenges to existing paradigms and facilities for end-user data analysis. Modern cyberinfrastructure tools provide a diverse set of services that can be composed into a system that provides physicists with powerful tools that give them straightforward access to large computing resources, with low barriers to entry. The Coffea-Casa analysis facility (AF) provides an environment for end users enabling the execution of increasingly complex analyses such as those demonstrated by the Analysis Grand Challenge (AGC) and capturing the features that physicists will need for the HL-LHC.
We describe the development progress of the Coffea-Casa facility featuring its modularity while demonstrating the ability to port and customize the facility software stack to other locations. The facility also facilitates the support of batch systems while staying Kubernetes-native. We present the evolved architecture of the facility, such as the integration of advanced data delivery services (e.g. ServiceX) and making data caching services (e.g. XCache) available to end users of the facility. We also highlight the composability of modern cyberinfrastructure tools. To enable machine learning pipelines at coffee-casa analysis facilities, a set of industry ML solutions adopted for HEP columnar analysis were integrated on top of existing facility services. These services also feature transparent access for user workflows to GPUs available at a facility via inference servers while using Kubernetes as enabling technology.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Collaborative Computing Support for Analysis Facilities Exploiting Software as Infrastructure Techniques
Authors:
Maria Acosta Flechas,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Lindsey Gray,
Burt Holzman,
Carl Lundstedt,
Oksana Shadura,
Nicholas Smith,
John Thiltges
Abstract:
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked sof…
▽ More
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked software, it is possible for multiple institutes to collaborate more effectively since resource details are abstracted away through various forms of hardware and software virtualization. In this whitepaper we will outline the development of two analysis facilities: "Coffea-casa" at University of Nebraska Lincoln and the "Elastic Analysis Facility" at Fermilab, and how utilizing platform abstraction has improved the development of common software for each of these facilities, and future development plans made possible by this methodology.
△ Less
Submitted 22 March, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Search for Pair Production of Light Scalar Top Quarks in p-pbar Collisions at sqrt{s}=1.8 TeV
Authors:
Carl Lundstedt,
Dan Claes
Abstract:
Using 85.2 +/- 3.6 pb^-1 of p-pbar collisions collected at sqrt(s)=1.8 TeV with the D0 detector at Fermilab's Tevatron Collider, we present the results of a search for direct pair production of scalar top quarks ~t, the supersymmetric partners of the top quark. We examined events containing two or more jets and missing transverse energy, the signature of light scalar top quark decays to charm qu…
▽ More
Using 85.2 +/- 3.6 pb^-1 of p-pbar collisions collected at sqrt(s)=1.8 TeV with the D0 detector at Fermilab's Tevatron Collider, we present the results of a search for direct pair production of scalar top quarks ~t, the supersymmetric partners of the top quark. We examined events containing two or more jets and missing transverse energy, the signature of light scalar top quark decays to charm quarks and neutralinos. After selections, we observe 27 events while expecting 31.1 +/- 6.4 events from known standard model processes. Comparing these results to next-to-leading-order production cross sections, we exclude a significant region of ~t and neutralino phase space. In particular, we exclude the ~t mass m_~t < 122 GeV/c^2 for a neutralino mass of 45 GeV/c^2.
△ Less
Submitted 22 April, 2004;
originally announced April 2004.