Search | arXiv e-print repository

doi 10.1093/mnras/stad209

Revealing the Milky Way's Most Recent Major Merger with a Gaia EDR3 Catalog of Machine-Learned Line-of-Sight Velocities

Authors: Adriana Dropulic, Hongwan Liu, Bryan Ostdiek, Mariangela Lisanti

Abstract: Machine learning can play a powerful role in inferring missing line-of-sight velocities from astrometry in surveys such as Gaia. In this paper, we apply a neural network to Gaia Early Data Release 3 (EDR3) and obtain line-of-sight velocities and associated uncertainties for ~92 million stars. The network, which takes as input a star's parallax, angular coordinates, and proper motions, is trained a… ▽ More Machine learning can play a powerful role in inferring missing line-of-sight velocities from astrometry in surveys such as Gaia. In this paper, we apply a neural network to Gaia Early Data Release 3 (EDR3) and obtain line-of-sight velocities and associated uncertainties for ~92 million stars. The network, which takes as input a star's parallax, angular coordinates, and proper motions, is trained and validated on ~6.4 million stars in Gaia with complete phase-space information. The network's uncertainty on its velocity prediction is a key aspect of its design; by properly convolving these uncertainties with the inferred velocities, we obtain accurate stellar kinematic distributions. As a first science application, we use the new network-completed catalog to identify candidate stars that belong to the Milky Way's most recent major merger, Gaia-Sausage-Enceladus (GSE). We present the kinematic, energy, angular momentum, and spatial distributions of the ~450,000 GSE candidates in this sample, and also study the chemical abundances of those with cross matches to GALAH and APOGEE. The network's predictive power will only continue to improve with future Gaia data releases as the training set of stars with complete phase-space information grows. This work provides a first demonstration of how to use machine learning to exploit high-dimensional correlations on data to infer line-of-sight velocities, and offers a template for how to train, validate and apply such a neural network when complete observational data is not available. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 18 pages, 11 figures

arXiv:2203.01343 [pdf, other]

doi 10.1103/PhysRevD.106.035014

Creating Simple, Interpretable Anomaly Detectors for New Physics in Jet Substructure

Authors: Layne Bradshaw, Spencer Chang, Bryan Ostdiek

Abstract: Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett et al., which maps out the physical observables l… ▽ More Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett et al., which maps out the physical observables learned by a neural network classifier, to the case of anomaly detection. We propose two different strategies that use a small number of high-level observables to mimic the decisions made by the autoencoder on background events, one designed to directly learn the output of the autoencoder, and the other designed to learn the difference between the autoencoder's outputs on a pair of events. Despite the underlying differences in their approach, we find that both strategies have similar ordering performance as the autoencoder and independently use the same six high-level observables. From there, we compare the performance of these networks as anomaly detectors. We find that both strategies perform similarly to the autoencoder across a variety of signals, giving a nontrivial demonstration that learning to order background events transfers to ordering a variety of signal events. △ Less

Submitted 9 September, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: 17 pages, 8 figures; Corrections made for p_T, m calculations; analysis rerun with conclusions largely unchanged. Published version

arXiv:2110.06948 [pdf, other]

doi 10.1007/JHEP03(2022)066

Challenges for Unsupervised Anomaly Detection in Particle Physics

Authors: Katherine Fraser, Samuel Homiller, Rashmish K. Mishra, Bryan Ostdiek, Matthew D. Schwartz

Abstract: Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence o… ▽ More Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and $W$) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 22 + 2 pages, 8 figures, 2 tables

arXiv:2105.14027 [pdf, other]

doi 10.21468/SciPostPhys.12.1.043

The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

Authors: T. Aarrestad, M. van Beekveld, M. Bona, A. Boveia, S. Caron, J. Davies, A. De Simone, C. Doglioni, J. M. Duarte, A. Farbin, H. Gupta, L. Hendriks, L. Heinrich, J. Howarth, P. Jawahar, A. Jueid, J. Lastow, A. Leinweber, J. Mamuzic, E. Merényi, A. Morandini, P. Moskvitina, C. Nellist, J. Ngadiuba, B. Ostdiek , et al. (14 additional authors not shown)

Abstract: We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We defin… ▽ More We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge. △ Less

Submitted 9 December, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: v1: 54 pages, 24 figures. v2: 56 pages, citations added, extend discussion of look-elsewhere-effect, results unchanged; v3. minor typos and updated references

Journal ref: SciPost Phys. 12, 043 (2022)

arXiv:2103.14039 [pdf, other]

doi 10.3847/2041-8213/ac09ef

Machine Learning the 6th Dimension: Stellar Radial Velocities from 5D Phase-Space Correlations

Authors: Adriana Dropulic, Bryan Ostdiek, Laura J. Chang, Hongwan Liu, Timothy Cohen, Mariangela Lisanti

Abstract: The Gaia satellite will observe the positions and velocities of over a billion Milky Way stars. In the early data releases, the majority of observed stars do not have complete 6D phase-space information. In this Letter, we demonstrate the ability to infer the missing line-of-sight velocities until more spectroscopic observations become available. We utilize a novel neural network architecture that… ▽ More The Gaia satellite will observe the positions and velocities of over a billion Milky Way stars. In the early data releases, the majority of observed stars do not have complete 6D phase-space information. In this Letter, we demonstrate the ability to infer the missing line-of-sight velocities until more spectroscopic observations become available. We utilize a novel neural network architecture that, after being trained on a subset of data with complete phase-space information, takes in a star's 5D astrometry (angular coordinates, proper motions, and parallax) and outputs a predicted line-of-sight velocity with an associated uncertainty. Working with a mock Gaia catalog, we show that the network can successfully recover the distributions and correlations of each velocity component for stars that fall within ~5 kpc of the Sun. We also demonstrate that the network can accurately reconstruct the velocity distribution of a kinematic substructure in the stellar halo that is spatially uniform, even when it comprises a small fraction of the total star count. △ Less

Submitted 6 July, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: 7+9 pages, 3+4 figures, v2: minor revisions (typos and clarification), method and conclusions unchanged, published in ApJL

Journal ref: ApJL 915 L14 (2021)

arXiv:2009.06663 [pdf, other]

doi 10.1051/0004-6361/202142030

Image segmentation for analyzing galaxy-galaxy strong lensing systems

Authors: Bryan Ostdiek, Ana Diaz Rivero, Cora Dvorkin

Abstract: The goal of this paper is to develop a machine learning model to analyze the main gravitational lens and detect dark substructure (subhalos) within simulated images of strongly lensed galaxies. Using the technique of image segmentation, we turn the task of identifying subhalos into a classification problem, where we label each pixel in an image as coming from the main lens, a subhalo within a binn… ▽ More The goal of this paper is to develop a machine learning model to analyze the main gravitational lens and detect dark substructure (subhalos) within simulated images of strongly lensed galaxies. Using the technique of image segmentation, we turn the task of identifying subhalos into a classification problem, where we label each pixel in an image as coming from the main lens, a subhalo within a binned mass range, or neither. Our network is only trained on images with a single smooth lens and either zero or one subhalo near the Einstein ring. On an independent test set with lenses with large ellipticities, quadrupole and octopole moments, and for source apparent magnitudes between 17-25, the area of the main lens is recovered accurately. On average, only 1.3% of the true area is missed and 1.2% of the true area is added to another part of the lens. In addition, subhalos as light as $10^{8.5}M_{\odot}$ can be detected if they lie in bright pixels along the Einstein ring. Furthermore, the model is able to generalize to new contexts it has not been trained on, such as locating multiple subhalos with varying masses or more than one large smooth lens. △ Less

Submitted 26 January, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

Comments: v1: 5 + 3 pages, 3 figures. v2: matches accepted version in A&A Letters. 5+2 pages; 3+1 figures

Journal ref: A&A 657, L14 (2022)

arXiv:2009.06639 [pdf, other]

doi 10.3847/1538-4357/ac2d8d

Extracting the Subhalo Mass Function from Strong Lens Images with Image Segmentation

Authors: Bryan Ostdiek, Ana Diaz Rivero, Cora Dvorkin

Abstract: Detecting substructure within strongly lensed images is a promising route to shed light on the nature of dark matter. However, it is a challenging task, which traditionally requires detailed lens modeling and source reconstruction, taking weeks to analyze each system. We use machine-learning to circumvent the need for lens and source modeling and develop a neural network to both locate subhalos in… ▽ More Detecting substructure within strongly lensed images is a promising route to shed light on the nature of dark matter. However, it is a challenging task, which traditionally requires detailed lens modeling and source reconstruction, taking weeks to analyze each system. We use machine-learning to circumvent the need for lens and source modeling and develop a neural network to both locate subhalos in an image as well as determine their mass using the technique of image segmentation. The network is trained on images with a single subhalo located near the Einstein ring across a wide range of apparent source magnitudes. The network is then able to resolve subhalos with masses $m\gtrsim 10^{8.5} M_{\odot}$. Training in this way allows the network to learn the gravitational lensing of light, and remarkably, it is then able to detect entire populations of substructure, even for locations further away from the Einstein ring than those used in training. Over a wide range of the apparent source magnitude, the false-positive rate is around three false subhalos per 100 images, coming mostly from the lightest detectable subhalo for that signal-to-noise ratio. With good accuracy and a low false-positive rate, counting the number of pixels assigned to each subhalo class over multiple images allows for a measurement of the subhalo mass function (SMF). When measured over three mass bins from $10^9M_{\odot}$--$10^{10} M_{\odot}$ the SMF slope is recovered with an error of 36% for 50 images, and this improves to 10% for 1000 images with Hubble Space Telescope-like noise. △ Less

Submitted 14 February, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

Comments: v1:23 + 5 pages, 12 + 2 figures. v2: matches accepted version in ApJ. v3: matches version published in ApJ

arXiv:1709.10106 [pdf, other]

doi 10.1103/PhysRevD.97.056009

What is the Machine Learning?

Authors: Spencer Chang, Timothy Cohen, Bryan Ostdiek

Abstract: Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given vari… ▽ More Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn. △ Less

Submitted 28 March, 2018; v1 submitted 28 September, 2017; originally announced September 2017.

Comments: 6 pages, 3 figures. Version published in PRD, discussion added

Journal ref: Phys. Rev. D 97, 056009 (2018)

arXiv:1706.09451 [pdf, other]

doi 10.1007/JHEP02(2018)034

(Machine) Learning to Do More with Less

Authors: Timothy Cohen, Marat Freytsis, Bryan Ostdiek

Abstract: Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard "fully supervised" approach (that relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called "weak… ▽ More Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard "fully supervised" approach (that relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called "weakly supervised" technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail -- both analytically and numerically -- with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. △ Less

Submitted 28 March, 2018; v1 submitted 28 June, 2017; originally announced June 2017.

Comments: 32 pages, 12 figures. Example code is provided at https://github.com/bostdiek/PublicWeaklySupervised . v3: Version published in JHEP, discussion added

Showing 1–9 of 9 results for author: Ostdiek, B