Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Amram, Oz; Anzalone, Luca; Birk, Joschka; Faroughy, Darius A.; Hallin, Anna; Kasieczka, Gregor; Krämer, Michael; Pang, Ian; Reyes-Gonzalez, Humberto; Shih, David

High Energy Physics - Phenomenology

arXiv:2412.10504 (hep-ph)

[Submitted on 13 Dec 2024]

Title:Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Authors:Oz Amram, Luca Anzalone, Joschka Birk, Darius A. Faroughy, Anna Hallin, Gregor Kasieczka, Michael Krämer, Ian Pang, Humberto Reyes-Gonzalez, David Shih

View PDF HTML (experimental)

Abstract:Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 180M high $p_T$ jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-$\alpha$ foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.

Comments:	11 pages, 4 figures, the AspenOpenJets dataset can be found at this http URL
Subjects:	High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Machine Learning (stat.ML)
Cite as:	arXiv:2412.10504 [hep-ph]
	(or arXiv:2412.10504v1 [hep-ph] for this version)
	https://doi.org/10.48550/arXiv.2412.10504

Submission history

From: Darius Faroughy [view email]
[v1] Fri, 13 Dec 2024 19:00:03 UTC (12,843 KB)

High Energy Physics - Phenomenology

Title:Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

High Energy Physics - Phenomenology

Title:Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators