-
RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey
Authors:
Nikhel Gupta,
Ray P. Norris,
Zeeshan Hayder,
Minh Huynh,
Lars Petersson,
X. Rosalind Wang,
Andrew M. Hopkins,
Heinz Andernach,
Yjan Gordon,
Simone Riggi,
Miranda Yew,
Evan J. Crawford,
Bärbel Koribalski,
Miroslav D. Filipović,
Anna D. Kapinśka,
Stanislav Shabala,
Tessa Vernstrom,
Joshua R. Marvil
Abstract:
We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio…
▽ More
We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift
Authors:
Inigo V. Slijepcevic,
Anna M. M. Scaife,
Mike Walmsley,
Micah Bowles,
Ivy Wong,
Stanislav S. Shabala,
Hongming Tang
Abstract:
In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy…
▽ More
In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.
△ Less
Submitted 4 May, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Radio Galaxy Zoo: Unsupervised Clustering of Convolutionally Auto-encoded Radio-astronomical Images
Authors:
Nicholas O. Ralph,
Ray P. Norris,
Gu Fang,
Laurence A. F. Park,
Timothy J. Galvin,
Matthew J. Alger,
Heinz Andernach,
Chris Lintott,
Lawrence Rudnick,
Stanislav Shabala,
O. Ivy Wong
Abstract:
This paper demonstrates a novel and efficient unsupervised clustering method with the combination of a Self-Organising Map (SOM) and a convolutional autoencoder. The rapidly increasing volume of radio-astronomical data has increased demand for machine learning methods as solutions to classification and outlier detection. Major astronomical discoveries are unplanned and found in the unexpected, mak…
▽ More
This paper demonstrates a novel and efficient unsupervised clustering method with the combination of a Self-Organising Map (SOM) and a convolutional autoencoder. The rapidly increasing volume of radio-astronomical data has increased demand for machine learning methods as solutions to classification and outlier detection. Major astronomical discoveries are unplanned and found in the unexpected, making unsupervised machine learning highly desirable by operating without assumptions and labelled training data. Our approach shows SOM training time is drastically reduced and high-level features can be clustered by training on auto-encoded feature vectors instead of raw images. Our results demonstrate this method is capable of accurately separating outliers on a SOM with neighbourhood similarity and K-means clustering of radio-astronomical features complexity. We present this method as a powerful new approach to data exploration by providing a detailed understanding of the morphology and relationships of Radio Galaxy Zoo (RGZ) dataset image features which can be applied to new radio survey data.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.