-
Building Machine Learning Challenges for Anomaly Detection in Science
Authors:
Elizabeth G. Campolongo,
Yuan-Tang Chou,
Ekaterina Govorkova,
Wahid Bhimji,
Wei-Lun Chao,
Chris Harris,
Shih-Chieh Hsu,
Hilmar Lapp,
Mark S. Neubauer,
Josephine Namayanja,
Aneesh Subramanian,
Philip Harris,
Advaith Anand,
David E. Carlyn,
Subhankar Ghosh,
Christopher Lawrence,
Eric Moreno,
Ryan Raikman,
Jiaman Wu,
Ziheng Zhang,
Bayu Adhi,
Mohammad Ahmadi Gharehtoragh,
Saúl Alonso Monsalve,
Marta Babicz,
Furqan Baig
, et al. (125 additional authors not shown)
Abstract:
Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be c…
▽ More
Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.
△ Less
Submitted 29 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
A Poisson Process AutoDecoder for X-ray Sources
Authors:
Yanke Song,
Victoria Ashley Villar,
Juan Rafael Martinez-Galarza,
Steven Dillmann
Abstract:
X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. P…
▽ More
X-ray observing facilities, such as the Chandra X-ray Observatory and the eROSITA, have detected millions of astronomical sources associated with high-energy phenomena. The arrival of photons as a function of time follows a Poisson process and can vary by orders-of-magnitude, presenting obstacles for common tasks such as source classification, physical property derivation, and anomaly detection. Previous work has either failed to directly capture the Poisson nature of the data or only focuses on Poisson rate function reconstruction. In this work, we present Poisson Process AutoDecoder (PPAD). PPAD is a neural field decoder that maps fixed-length latent features to continuous Poisson rate functions across energy band and time via unsupervised learning. PPAD reconstructs the rate function and yields a representation at the same time. We demonstrate the efficacy of PPAD via reconstruction, regression, classification and anomaly detection experiments using the Chandra Source Catalog.
△ Less
Submitted 4 February, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Hyperluminous Supersoft X-Ray Sources in the Chandra Catalog
Authors:
Andrea Sacchi,
Kevin Paggeot,
Steven Dillmann,
Juan Rafael Martinez-Galarza,
Peter Kosec
Abstract:
Hyperluminous supersoft X-ray sources, such as bright extragalactic sources characterized by particularly soft X-ray spectra, offer a unique opportunity to study accretion onto supermassive black holes in extreme conditions. Examples of hyperluminous supersoft sources are tidal disruption events, systems exhibiting quasi-periodic eruptions, changing-look AGN, and anomalous nuclear transients. Alth…
▽ More
Hyperluminous supersoft X-ray sources, such as bright extragalactic sources characterized by particularly soft X-ray spectra, offer a unique opportunity to study accretion onto supermassive black holes in extreme conditions. Examples of hyperluminous supersoft sources are tidal disruption events, systems exhibiting quasi-periodic eruptions, changing-look AGN, and anomalous nuclear transients. Although these objects are rare phenomena amongst the population of X-ray sources, we developed an efficient algorithm to identify promising candidates exploiting archival observations. In this work, we present the results of a search for hyperluminous supersoft X-ray sources in the recently released Chandra catalog of serendipitous X-ray sources. This archival search has been performed via both a manual implementation of the algorithm we developed and a novel machine-learning-based approach. This search identified a new tidal disruption event, which might have occurred in an intermediate-mass black hole. This event occurred between 2001 and 2002, making it one of the first tidal disruption events ever observed by Chandra.
△ Less
Submitted 20 April, 2025; v1 submitted 31 January, 2025;
originally announced February 2025.
-
Representation Learning for Time-Domain High-Energy Astrophysics: Discovery of Extragalactic Fast X-ray Transient XRT 200515
Authors:
Steven Dillmann,
Juan Rafael Martínez-Galarza,
Roberto Soria,
Rosanne Di Stefano,
Vinay L. Kashyap
Abstract:
We present a novel representation learning method for downstream tasks like anomaly detection, unsupervised classification, and similarity searches in high-energy data sets. This enabled the discovery of a new extragalactic fast X-ray transient (FXT) in Chandra archival data, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous discoveries in X-ray…
▽ More
We present a novel representation learning method for downstream tasks like anomaly detection, unsupervised classification, and similarity searches in high-energy data sets. This enabled the discovery of a new extragalactic fast X-ray transient (FXT) in Chandra archival data, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous discoveries in X-ray astronomy, including FXTs from binary neutron star mergers and an extragalactic planetary transit candidate, highlight the need for systematic transient searches in X-ray archives. We introduce new event file representations, E-t maps and E-t-dt cubes, that effectively encode both temporal and spectral information, enabling the seamless application of machine learning to variable-length event file time series. Our unsupervised learning approach employs PCA or sparse autoencoders to extract low-dimensional, informative features from these data representations, followed by clustering in the embedding space with DBSCAN. New transients are identified within transient-dominant clusters or through nearest-neighbour searches around known transients, producing a catalogue of 3559 candidates (3447 flares and 112 dips). XRT 200515 exhibits unique temporal and spectral variability, including an intense, hard <10s initial burst, followed by spectral softening in an ~800s oscillating tail. We interpret XRT 200515 as either the first giant magnetar flare observed at low X-ray energies or the first extragalactic Type I X-ray burst from a faint, previously unknown low-mass X-ray binary in the LMC. Our method extends to data sets from other observatories such as XMM-Newton, Swift-XRT, eROSITA, Einstein Probe, and upcoming missions like AXIS.
△ Less
Submitted 3 March, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.