-
Towards AI-assisted Neutrino Flavor Theory Design
Authors:
Jason Benjamin Baretz,
Max Fieg,
Vijay Ganesh,
Aishik Ghosh,
V. Knapp-Perez,
Jake Rudolph,
Daniel Whiteson
Abstract:
Particle physics theories, such as those which explain neutrino flavor mixing, arise from a vast landscape of model-building possibilities. A model's construction typically relies on the intuition of theorists. It also requires considerable effort to identify appropriate symmetry groups, assign field representations, and extract predictions for comparison with experimental data. We develop an Auto…
▽ More
Particle physics theories, such as those which explain neutrino flavor mixing, arise from a vast landscape of model-building possibilities. A model's construction typically relies on the intuition of theorists. It also requires considerable effort to identify appropriate symmetry groups, assign field representations, and extract predictions for comparison with experimental data. We develop an Autonomous Model Builder (AMBer), a framework in which a reinforcement learning agent interacts with a streamlined physics software pipeline to search these spaces efficiently. AMBer selects symmetry groups, particle content, and group representation assignments to construct viable models while minimizing the number of free parameters introduced. We validate our approach in well-studied regions of theory space and extend the exploration to a novel, previously unexamined symmetry group. While demonstrated in the context of neutrino flavor theories, this approach of reinforcement learning with physics software feedback may be extended to other theoretical model-building problems in the future.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Learning Broken Symmetries with Approximate Invariance
Authors:
Seth Nabat,
Aishik Ghosh,
Edmund Witkowski,
Gregor Kasieczka,
Daniel Whiteson
Abstract:
Recognizing symmetries in data allows for significant boosts in neural network training, which is especially important where training data are limited. In many cases, however, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data, due to asymmetries in the detector, or varying response resolution as a function of particle momentum. Standard approaches,…
▽ More
Recognizing symmetries in data allows for significant boosts in neural network training, which is especially important where training data are limited. In many cases, however, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data, due to asymmetries in the detector, or varying response resolution as a function of particle momentum. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry, effectively overconstraining the response of the neural network. We propose a learning model which balances the generality and asymptotic performance of unconstrained networks with the rapid learning of constrained networks. This is achieved through a dual-subnet structure, where one network is constrained by the symmetry and the other is not, along with a learned symmetry factor. In a simplified toy example that demonstrates violation of Lorentz invariance, our model learns as rapidly as symmetry-constrained networks but escapes its performance limitations.
△ Less
Submitted 3 April, 2025; v1 submitted 24 December, 2024;
originally announced December 2024.
-
Reconstruction of boosted and resolved multi-Higgs-boson events with symmetry-preserving attention networks
Authors:
Haoyang Li,
Marko Stamenkovic,
Alexander Shmakov,
Michael Fenton,
Darius Shih-Chieh Chao,
Kaitlyn Maiya White,
Caden Mikkelsen,
Jovan Mitic,
Cristina Mantilla Suarez,
Melissa Quinnan,
Greg Landsberg,
Harvey Newman,
Pierre Baldi,
Daniel Whiteson,
Javier Duarte
Abstract:
The production of multiple Higgs bosons at the CERN LHC provides a direct way to measure the trilinear and quartic Higgs self-interaction strengths as well as potential access to beyond the standard model effects that can enhance production at large transverse momentum $p_{\mathrm{T}}$. The largest event fraction arises from the fully hadronic final state in which every Higgs boson decays to a bot…
▽ More
The production of multiple Higgs bosons at the CERN LHC provides a direct way to measure the trilinear and quartic Higgs self-interaction strengths as well as potential access to beyond the standard model effects that can enhance production at large transverse momentum $p_{\mathrm{T}}$. The largest event fraction arises from the fully hadronic final state in which every Higgs boson decays to a bottom quark-antiquark pair ($b\bar{b}$). This introduces a combinatorial challenge known as the \emph{jet assignment problem}: assigning jets to sets representing Higgs boson candidates. Symmetry-preserving attention networks (SPA-Nets) have been been developed to address this challenge. However, the complexity of jet assignment increases when simultaneously considering both $H\rightarrow b\bar{b}$ reconstruction possibilities, i.e., two "resolved" small-radius jets each containing a shower initiated by a $b$-quark or one "boosted" large-radius jet containing a merged shower initiated by a $b\bar{b}$ pair. The latter improves the reconstruction efficiency at high $p_{\mathrm{T}}$. In this work, we introduce a generalization to the SPA-Net approach to simultaneously consider both boosted and resolved reconstruction possibilities and unambiguously interpret an event as "fully resolved'', "fully boosted", or in between. We report the performance of baseline methods, the original SPA-Net approach, and our generalized version on nonresonant $HH$ and $HHH$ production at the LHC. Considering both boosted and resolved topologies, our SPA-Net approach increases the Higgs boson reconstruction purity by 57--62\% and the efficiency by 23--38\% compared to the baseline method depending on the final state.
△ Less
Submitted 28 March, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
The Landscape of Unfolding with Machine Learning
Authors:
Nathan Huetsch,
Javier Mariño Villadamigo,
Alexander Shmakov,
Sascha Diefenbacher,
Vinicius Mikuni,
Theo Heimel,
Michael Fenton,
Kevin Greif,
Benjamin Nachman,
Daniel Whiteson,
Anja Butter,
Tilman Plehn
Abstract:
Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex obse…
▽ More
Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.
△ Less
Submitted 17 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion
Authors:
Alexander Shmakov,
Kevin Greif,
Michael James Fenton,
Aishik Ghosh,
Pierre Baldi,
Daniel Whiteson
Abstract:
The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generati…
▽ More
The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider.
△ Less
Submitted 23 January, 2025; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Reconstruction of Unstable Heavy Particles Using Deep Symmetry-Preserving Attention Networks
Authors:
Michael James Fenton,
Alexander Shmakov,
Hideki Okawa,
Yuji Li,
Ko-Yang Hsiao,
Shih-Chieh Hsu,
Daniel Whiteson,
Pierre Baldi
Abstract:
Reconstructing unstable heavy particles requires sophisticated techniques to sift through the large number of possible permutations for assignment of detector objects to the underlying partons. Anapproach based on a generalized attention mechanism, symmetry preserving attention networks (SPA-NET), has been previously applied to top quark pair decays at the Large Hadron Collider which produce only…
▽ More
Reconstructing unstable heavy particles requires sophisticated techniques to sift through the large number of possible permutations for assignment of detector objects to the underlying partons. Anapproach based on a generalized attention mechanism, symmetry preserving attention networks (SPA-NET), has been previously applied to top quark pair decays at the Large Hadron Collider which produce only hadronic jets. Here we extend the SPA-NET architecture to consider multiple input object types, such as leptons, as well as global event features, such as the missing transverse momentum. Inaddition, we provide regression and classification outputs to supplement the parton assignment. We explore the performance of the extended capability of SPA-NET in the context of semi-leptonic decays of top quark pairs as well as top quark pairs produced in association with a Higgs boson. We find significant improvements in the power of three representative studies: a search for ttH, a measurement of the top quark mass, and a search for a heavy Z' decaying to top quark pairs. We present ablation studies to provide insight on what the network has learned in each case.
△ Less
Submitted 30 April, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Artificial Intelligence for the Electron Ion Collider (AI4EIC)
Authors:
C. Allaire,
R. Ammendola,
E. -C. Aschenauer,
M. Balandat,
M. Battaglieri,
J. Bernauer,
M. Bondì,
N. Branson,
T. Britton,
A. Butter,
I. Chahrour,
P. Chatagnon,
E. Cisbani,
E. W. Cline,
S. Dash,
C. Dean,
W. Deconinck,
A. Deshpande,
M. Diefenthaler,
R. Ent,
C. Fanelli,
M. Finger,
M. Finger, Jr.,
E. Fol,
S. Furletov
, et al. (70 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took…
▽ More
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Generalizing to new geometries with Geometry-Aware Autoregressive Models (GAAMs) for fast calorimeter simulation
Authors:
Junze Liu,
Aishik Ghosh,
Dylan Smith,
Pierre Baldi,
Daniel Whiteson
Abstract:
Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to opt…
▽ More
Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. This proof-of-concept study motivates the design of a foundational model that will be a crucial tool for the study of future detectors, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.
△ Less
Submitted 14 November, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics
Authors:
Alexander Shmakov,
Kevin Greif,
Michael Fenton,
Aishik Ghosh,
Pierre Baldi,
Daniel Whiteson
Abstract:
High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this \textit{inverse problem} of mapping detector observations to theoretical quantities of the underlyin…
▽ More
High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this \textit{inverse problem} of mapping detector observations to theoretical quantities of the underlying collision are essential parts of many physics analyses at the LHC. We investigate and compare various generative deep learning methods to approximate this inverse mapping. We introduce a novel unified architecture, termed latent variation diffusion models, which combines the latent learning of cutting-edge generative art approaches with an end-to-end variational framework. We demonstrate the effectiveness of this approach for reconstructing global distributions of theoretical kinematic quantities, as well as for ensuring the adherence of the learned posterior distributions to known physics constraints. Our unified approach achieves a distribution-free distance to the truth of over 20 times less than non-latent state-of-the-art baseline and 3 times less than traditional latent diffusion models.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Geometry-aware Autoregressive Models for Calorimeter Shower Simulations
Authors:
Junze Liu,
Aishik Ghosh,
Dylan Smith,
Pierre Baldi,
Daniel Whiteson
Abstract:
Calorimeter shower simulations are often the bottleneck in simulation time for particle physics detectors. A lot of effort is currently spent on optimizing generative architectures for specific detector geometries, which generalize poorly. We develop a geometry-aware autoregressive model on a range of calorimeter geometries such that the model learns to adapt its energy deposition depending on the…
▽ More
Calorimeter shower simulations are often the bottleneck in simulation time for particle physics detectors. A lot of effort is currently spent on optimizing generative architectures for specific detector geometries, which generalize poorly. We develop a geometry-aware autoregressive model on a range of calorimeter geometries such that the model learns to adapt its energy deposition depending on the size and position of the cells. This is a key proof-of-concept step towards building a model that can generalize to new unseen calorimeter geometries with little to no additional training. Such a model can replace the hundreds of generative models used for calorimeter simulation in a Large Hadron Collider experiment. For the study of future detectors, such a model will dramatically reduce the large upfront investment usually needed to generate simulations.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Machine-Learning Compression for Particle Physics Discoveries
Authors:
Jack H. Collins,
Yifeng Huang,
Simon Knapen,
Benjamin Nachman,
Daniel Whiteson
Abstract:
In collider-based particle and nuclear physics experiments, data are produced at such extreme rates that only a subset can be recorded for later analysis. Typically, algorithms select individual collision events for preservation and store the complete experimental response. A relatively new alternative strategy is to additionally save a partial record for a larger subset of events, allowing for la…
▽ More
In collider-based particle and nuclear physics experiments, data are produced at such extreme rates that only a subset can be recorded for later analysis. Typically, algorithms select individual collision events for preservation and store the complete experimental response. A relatively new alternative strategy is to additionally save a partial record for a larger subset of events, allowing for later specific analysis of a larger fraction of events. We propose a strategy that bridges these paradigms by compressing entire events for generic offline analysis but at a lower fidelity. An optimal-transport-based $β$ Variational Autoencoder (VAE) is used to automate the compression and the hyperparameter $β$ controls the compression fidelity. We introduce a new approach for multi-objective learning functions by simultaneously learning a VAE appropriate for all values of $β$ through parameterization. We present an example use case, a di-muon resonance search at the Large Hadron Collider (LHC), where we show that simulated data compressed by our $β$-VAE has enough fidelity to distinguish distinct signal morphologies.
△ Less
Submitted 18 December, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning
Authors:
Phiala Shanahan,
Kazuhiro Terao,
Daniel Whiteson
Abstract:
The rapidly-developing intersection of machine learning (ML) with high-energy physics (HEP) presents both opportunities and challenges to our community. Far beyond applications of standard ML tools to HEP problems, genuinely new and potentially revolutionary approaches are being developed by a generation of talent literate in both fields. There is an urgent need to support the needs of the interdi…
▽ More
The rapidly-developing intersection of machine learning (ML) with high-energy physics (HEP) presents both opportunities and challenges to our community. Far beyond applications of standard ML tools to HEP problems, genuinely new and potentially revolutionary approaches are being developed by a generation of talent literate in both fields. There is an urgent need to support the needs of the interdisciplinary community driving these developments, including funding dedicated research at the intersection of the two fields, investing in high-performance computing at universities and tailoring allocation policies to support this work, developing of community tools and standards, and providing education and career paths for young researchers attracted by the intellectual vitality of machine learning for high energy physics.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
SPANet: Generalized Permutationless Set Assignment for Particle Physics using Symmetry Preserving Attention
Authors:
Alexander Shmakov,
Michael James Fenton,
Ta-Wei Ho,
Shih-Chieh Hsu,
Daniel Whiteson,
Pierre Baldi
Abstract:
The creation of unstable heavy particles at the Large Hadron Collider is the most direct way to address some of the deepest open questions in physics. Collisions typically produce variable-size sets of observed particles which have inherent ambiguities complicating the assignment of observed particles to the decay products of the heavy particles. Current strategies for tackling these challenges in…
▽ More
The creation of unstable heavy particles at the Large Hadron Collider is the most direct way to address some of the deepest open questions in physics. Collisions typically produce variable-size sets of observed particles which have inherent ambiguities complicating the assignment of observed particles to the decay products of the heavy particles. Current strategies for tackling these challenges in the physics community ignore the physical symmetries of the decay products and consider all possible assignment permutations and do not scale to complex configurations. Attention based deep learning methods for sequence modelling have achieved state-of-the-art performance in natural language processing, but they lack built-in mechanisms to deal with the unique symmetries found in physical set-assignment problems. We introduce a novel method for constructing symmetry-preserving attention networks which reflect the problem's natural invariances to efficiently find assignments without evaluating all permutations. This general approach is applicable to arbitrarily complex configurations and significantly outperforms current methods, improving reconstruction efficiency between 19\% - 35\% on typical benchmark problems while decreasing inference time by two to five orders of magnitude on the most complex events, making many important and previously intractable cases tractable.
A full code repository containing a general library, the specific configuration used, and a complete dataset release, are avaiable at https://github.com/Alexanders101/SPANet
△ Less
Submitted 22 July, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks
Authors:
Michael James Fenton,
Alexander Shmakov,
Ta-Wei Ho,
Shih-Chieh Hsu,
Daniel Whiteson,
Pierre Baldi
Abstract:
Top quarks, produced in large numbers at the Large Hadron Collider, have a complex detector signature and require special reconstruction techniques. The most common decay mode, the "all-jet" channel, results in a 6-jet final state which is particularly difficult to reconstruct in $pp$ collisions due to the large number of permutations possible. We present a novel approach to this class of problem,…
▽ More
Top quarks, produced in large numbers at the Large Hadron Collider, have a complex detector signature and require special reconstruction techniques. The most common decay mode, the "all-jet" channel, results in a 6-jet final state which is particularly difficult to reconstruct in $pp$ collisions due to the large number of permutations possible. We present a novel approach to this class of problem, based on neural networks using a generalized attention mechanism, that we call Symmetry Preserving Attention Networks (SPA-Net). We train one such network to identify the decay products of each top quark unambiguously and without combinatorial explosion as an example of the power of this technique.This approach significantly outperforms existing state-of-the-art methods, correctly assigning all jets in $93.0%$ of $6$-jet, $87.8%$ of $7$-jet, and $82.6%$ of $\geq 8$-jet events respectively.
△ Less
Submitted 14 July, 2022; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Parameterized Machine Learning for High-Energy Physics
Authors:
Pierre Baldi,
Kyle Cranmer,
Taylor Faucett,
Peter Sadowski,
Daniel Whiteson
Abstract:
We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual…
▽ More
We investigate a new structure for machine learning classifiers applied to problems in high-energy physics by expanding the inputs to include not only measured features but also physics parameters. The physics parameters represent a smoothly varying learning task, and the resulting parameterized classifier can smoothly interpolate between them and replace sets of classifiers trained at individual values. This simplifies the training process and gives improved performance at intermediate values, even for complex problems requiring deep learning. Applications include tools parameterized in terms of theoretical model parameters, such as the mass of a particle, which allow for a single network to provide improved discrimination across a range of masses. This concept is simple to implement and allows for optimized interpolatable results.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
Enhanced Higgs to $τ^+τ^-$ Searches with Deep Learning
Authors:
Pierre Baldi,
Peter Sadowski,
Daniel Whiteson
Abstract:
The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5$σ$ significance barrier without more data. \emph{Deep learning} techniques have the potential to increase the statistical…
▽ More
The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5$σ$ significance barrier without more data. \emph{Deep learning} techniques have the potential to increase the statistical power of this analysis by \emph{automatically} learning complex, high-level data representations. In this work, deep neural networks are used to detect the decay of the Higgs to a pair of tau leptons. A Bayesian optimization algorithm is used to tune the network architecture and training algorithm hyperparameters, resulting in a deep network of eight non-linear processing layers that improves upon the performance of shallow classifiers even without the use of features specifically engineered by physicists for this application. The improvement in discovery significance is equivalent to an increase in the accumulated dataset of 25\%.
△ Less
Submitted 13 October, 2014;
originally announced October 2014.