-
What ZTF Saw Where Rubin Looked: Anomaly Hunting in DR23
Authors:
Maria V. Pruzhinskaya,
Anastasia D. Lavrukhina,
Timofey A. Semenikhi,
Alina A. Volnova,
Sreevarsha Sreejith,
Vadim V. Krushinsky,
Emmanuel Gangler,
Emille E. O. Ishida,
Matwey V. Kornilov,
Konstantin L. Malanchev
Abstract:
We present results from the SNAD VIII Workshop, during which we conducted the first systematic anomaly search in the ZTF fields also observed by LSSTComCam during Rubin Scientific Pipeline commissioning. Using the PineForest active anomaly detection algorithm, we analysed four selected fields (two galactic and two extragalactic) and visually inspected 400 candidates. As a result, we discovered six…
▽ More
We present results from the SNAD VIII Workshop, during which we conducted the first systematic anomaly search in the ZTF fields also observed by LSSTComCam during Rubin Scientific Pipeline commissioning. Using the PineForest active anomaly detection algorithm, we analysed four selected fields (two galactic and two extragalactic) and visually inspected 400 candidates. As a result, we discovered six previously uncatalogued variable stars, including RS~CVn, BY Draconis, ellipsoidal, and solar-type variables, and refined classifications and periods for six known objects. These results demonstrate the effectiveness of the SNAD anomaly detection pipeline and provide a preview of the discovery potential in the upcoming LSST data.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
Signatures to help interpretability of anomalies
Authors:
Emmanuel Gangler,
Emille E. O. Ishida,
Matwey V. Kornilov,
Vladimir Korolev,
Anastasia Lavrukhina,
Konstantin Malanchev,
Maria V. Pruzhinskaya,
Etienne Russeil,
Timofey Semenikhin,
Sreevarsha Sreejith,
Alina A. Volnova
Abstract:
Machine learning is often viewed as a black box when it comes to understanding its output, be it a decision or a score. Automatic anomaly detection is no exception to this rule, and quite often the astronomer is left to independently analyze the data in order to understand why a given event is tagged as an anomaly. We introduce here idea of anomaly signature, whose aim is to help the interpretabil…
▽ More
Machine learning is often viewed as a black box when it comes to understanding its output, be it a decision or a score. Automatic anomaly detection is no exception to this rule, and quite often the astronomer is left to independently analyze the data in order to understand why a given event is tagged as an anomaly. We introduce here idea of anomaly signature, whose aim is to help the interpretability of anomalies by highlighting which features contributed to the decision.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Building Russian Benchmark for Evaluation of Information Retrieval Models
Authors:
Grigory Kovalev,
Mikhail Tikhomirov,
Evgeny Kozhevnikov,
Max Kornilov,
Natalia Loukachevitch
Abstract:
We introduce RusBEIR, a comprehensive benchmark designed for zero-shot evaluation of information retrieval (IR) models in the Russian language. Comprising 17 datasets from various domains, it integrates adapted, translated, and newly created datasets, enabling systematic comparison of lexical and neural models. Our study highlights the importance of preprocessing for lexical models in morphologica…
▽ More
We introduce RusBEIR, a comprehensive benchmark designed for zero-shot evaluation of information retrieval (IR) models in the Russian language. Comprising 17 datasets from various domains, it integrates adapted, translated, and newly created datasets, enabling systematic comparison of lexical and neural models. Our study highlights the importance of preprocessing for lexical models in morphologically rich languages and confirms BM25 as a strong baseline for full-document retrieval. Neural models, such as mE5-large and BGE-M3, demonstrate superior performance on most datasets, but face challenges with long-document retrieval due to input size constraints. RusBEIR offers a unified, open-source framework that promotes research in Russian-language information retrieval.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Exploring the Universe with SNAD: Anomaly Detection in Astronomy
Authors:
Alina A. Volnova,
Patrick D. Aleo,
Anastasia Lavrukhina,
Etienne Russeil,
Timofey Semenikhin,
Emmanuel Gangler,
Emille E. O. Ishida,
Matwey V. Kornilov,
Vladimir Korolev,
Konstantin Malanchev,
Maria V. Pruzhinskaya,
Sreevarsha Sreejith
Abstract:
SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the fiel…
▽ More
SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the field of astrophysics. This paper provides a review of the SNAD project and summarizes the advancements and achievements made by the team over several years.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Coniferest: a complete active anomaly detection framework
Authors:
M. V. Kornilov,
V. S. Korolev,
K. L. Malanchev,
A. D. Lavrukhina,
E. Russeil,
T. A. Semenikhin,
E. Gangler,
E. E. O. Ishida,
M. V. Pruzhinskaya,
A. A. Volnova,
S. Sreejith
Abstract:
We present coniferest, an open source generic purpose active anomaly detection framework written in Python. The package design and implemented algorithms are described. Currently, static outlier detection analysis is supported via the Isolation forest algorithm. Moreover, Active Anomaly Discovery (AAD) and Pineforest algorithms are available to tackle active anomaly detection problems. The algorit…
▽ More
We present coniferest, an open source generic purpose active anomaly detection framework written in Python. The package design and implemented algorithms are described. Currently, static outlier detection analysis is supported via the Isolation forest algorithm. Moreover, Active Anomaly Discovery (AAD) and Pineforest algorithms are available to tackle active anomaly detection problems. The algorithms and package performance are evaluated on a series of synthetic datasets. We also describe a few success cases which resulted from applying the package to real astronomical data in active anomaly detection tasks within the SNAD project.
△ Less
Submitted 15 November, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Active Anomaly Detection for time-domain discoveries
Authors:
Emille E. O. Ishida,
Matwey V. Kornilov,
Konstantin L. Malanchev,
Maria V. Pruzhinskaya,
Alina A. Volnova,
Vladimir S. Korolev,
Florian Mondon,
Sreevarsha Sreejith,
Anastasia Malancheva,
Shubhomoy Das
Abstract:
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine le…
▽ More
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80\% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.
△ Less
Submitted 14 July, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Maximum likelihood estimation for disk image parameters
Authors:
Matwey V. Kornilov
Abstract:
We present a novel technique for estimating disk parameters (the centre and the radius) from its 2D image. It is based on the maximal likelihood approach utilising both edge pixels coordinates and the image intensity gradients. We emphasise the following advantages of our likelihood model. It has closed-form formulae for parameter estimating, requiring less computational resources than iterative a…
▽ More
We present a novel technique for estimating disk parameters (the centre and the radius) from its 2D image. It is based on the maximal likelihood approach utilising both edge pixels coordinates and the image intensity gradients. We emphasise the following advantages of our likelihood model. It has closed-form formulae for parameter estimating, requiring less computational resources than iterative algorithms therefore. The likelihood model naturally distinguishes the outer and inner annulus edges. The proposed technique was evaluated on both synthetic and real data.
△ Less
Submitted 18 March, 2020; v1 submitted 24 July, 2019;
originally announced July 2019.