-
A machine learning approach to galaxy properties: joint redshift-stellar mass probability distributions with Random Forest
Authors:
S. Mucesh,
W. G. Hartley,
A. Palmese,
O. Lahav,
L. Whiteway,
A. F. L. Bluck,
A. Alarcon,
A. Amon,
K. Bechtol,
G. M. Bernstein,
A. Carnero Rosell,
M. Carrasco Kind,
A. Choi,
K. Eckert,
S. Everett,
D. Gruen,
R. A. Gruendl,
I. Harrison,
E. M. Huff,
N. Kuropatkin,
I. Sevilla-Noarbe,
E. Sheldon,
B. Yanny,
M. Aguena,
S. Allam
, et al. (50 additional authors not shown)
Abstract:
We demonstrate that highly accurate joint redshift-stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep phot…
▽ More
We demonstrate that highly accurate joint redshift-stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep photometry in the $griz$ bands, and the second reflecting the photometric scatter present in the main DES survey, with carefully constructed representative training data in each case. We validate our joint PDFs for $10,699$ test galaxies by utilizing the copula probability integral transform and the Kendall distribution function, and their univariate counterparts to validate the marginals. Benchmarked against a basic set-up of the template-fitting code BAGPIPES, our ML-based method outperforms template fitting on all of our predefined performance metrics. In addition to accuracy, the RF is extremely fast, able to compute joint PDFs for a million galaxies in just under $6$ min with consumer computer hardware. Such speed enables PDFs to be derived in real time within analysis codes, solving potential storage issues. As part of this work we have developed GALPRO, a highly intuitive and efficient Python package to rapidly generate multivariate PDFs on-the-fly. GALPRO is documented and available for researchers to use in their cosmology and galaxy evolution studies.
△ Less
Submitted 19 February, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Machine Learning for Searching the Dark Energy Survey for Trans-Neptunian Objects
Authors:
B. Henghes,
O. Lahav,
D. W. Gerdes,
E. Lin,
R. Morgan,
T. M. C. Abbott,
M. Aguena,
S. Allam,
J. Annis,
S. Avila,
E. Bertin,
D. Brooks,
D. L. Burke,
A. CarneroRosell,
M. CarrascoKind,
J. Carretero,
C. Conselice,
M. Costanzi,
L. N. da Costa,
J. DeVicente,
S. Desai,
H. T. Diehl,
P. Doel,
S. Everett,
I. Ferrero
, et al. (34 additional authors not shown)
Abstract:
In this paper we investigate how implementing machine learning could improve the efficiency of the search for Trans-Neptunian Objects (TNOs) within Dark Energy Survey (DES) data when used alongside orbit fitting. The discovery of multiple TNOs that appear to show a similarity in their orbital parameters has led to the suggestion that one or more undetected planets, an as yet undiscovered "Planet 9…
▽ More
In this paper we investigate how implementing machine learning could improve the efficiency of the search for Trans-Neptunian Objects (TNOs) within Dark Energy Survey (DES) data when used alongside orbit fitting. The discovery of multiple TNOs that appear to show a similarity in their orbital parameters has led to the suggestion that one or more undetected planets, an as yet undiscovered "Planet 9", may be present in the outer Solar System. DES is well placed to detect such a planet and has already been used to discover many other TNOs. Here, we perform tests on eight different supervised machine learning algorithms, using a dataset consisting of simulated TNOs buried within real DES noise data. We found that the best performing classifier was the Random Forest which, when optimised, performed well at detecting the rare objects. We achieve an area under the receiver operating characteristic (ROC) curve, (AUC) $= 0.996 \pm 0.001$. After optimizing the decision threshold of the Random Forest, we achieve a recall of 0.96 while maintaining a precision of 0.80. Finally, by using the optimized classifier to pre-select objects, we are able to run the orbit-fitting stage of our detection pipeline five times faster.
△ Less
Submitted 10 December, 2020; v1 submitted 27 September, 2020;
originally announced September 2020.
-
Enabling real-time multi-messenger astrophysics discoveries with deep learning
Authors:
E. A. Huerta,
Gabrielle Allen,
Igor Andreoni,
Javier M. Antelis,
Etienne Bachelet,
Bruce Berriman,
Federica Bianco,
Rahul Biswas,
Matias Carrasco,
Kyle Chard,
Minsik Cho,
Philip S. Cowperthwaite,
Zachariah B. Etienne,
Maya Fishbach,
Francisco Förster,
Daniel George,
Tom Gibbs,
Matthew Graham,
William Gropp,
Robert Gruendl,
Anushri Gupta,
Roland Haas,
Sarah Habib,
Elise Jennings,
Margaret W. G. Johnson
, et al. (35 additional authors not shown)
Abstract:
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravit…
▽ More
Multi-messenger astrophysics is a fast-growing, interdisciplinary field that combines data, which vary in volume and speed of data processing, from many different instruments that probe the Universe using different cosmic messengers: electromagnetic waves, cosmic rays, gravitational waves and neutrinos. In this Expert Recommendation, we review the key challenges of real-time observations of gravitational wave sources and their electromagnetic and astroparticle counterparts, and make a number of recommendations to maximize their potential for scientific discovery. These recommendations refer to the design of scalable and computationally efficient machine learning algorithms; the cyber-infrastructure to numerically simulate astrophysical sources, and to process and interpret multi-messenger astrophysics data; the management of gravitational wave detections to trigger real-time alerts for electromagnetic and astroparticle follow-ups; a vision to harness future developments of machine learning and cyber-infrastructure resources to cope with the big-data requirements; and the need to build a community of experts to realize the goals of multi-messenger astrophysics.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Deep Learning at Scale for the Construction of Galaxy Catalogs in the Dark Energy Survey
Authors:
Asad Khan,
E. A. Huerta,
Sibo Wang,
Robert Gruendl,
Elise Jennings,
Huihuo Zheng
Abstract:
The scale of ongoing and future electromagnetic surveys pose formidable challenges to classify astronomical objects. Pioneering efforts on this front include citizen science campaigns adopted by the Sloan Digital Sky Survey (SDSS). SDSS datasets have been recently used to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. He…
▽ More
The scale of ongoing and future electromagnetic surveys pose formidable challenges to classify astronomical objects. Pioneering efforts on this front include citizen science campaigns adopted by the Sloan Digital Sky Survey (SDSS). SDSS datasets have been recently used to train neural network models to classify galaxies in the Dark Energy Survey (DES) that overlap the footprint of both surveys. Herein, we demonstrate that knowledge from deep learning algorithms, pre-trained with real-object images, can be transferred to classify galaxies that overlap both SDSS and DES surveys, achieving state-of-the-art accuracy $\gtrsim99.6\%$. We demonstrate that this process can be completed within just eight minutes using distributed training. While this represents a significant step towards the classification of DES galaxies that overlap previous surveys, we need to initiate the characterization of unlabelled DES galaxies in new regions of parameter space. To accelerate this program, we use our neural network classifier to label over ten thousand unlabelled DES galaxies, which do not overlap previous surveys. Furthermore, we use our neural network model as a feature extractor for unsupervised clustering and find that unlabeled DES images can be grouped together in two distinct galaxy classes based on their morphology, which provides a heuristic check that the learning is successfully transferred to the classification of unlabelled DES images. We conclude by showing that these newly labeled datasets can be combined with unsupervised recursive training to create large-scale DES galaxy catalogs in preparation for the Large Synoptic Survey Telescope era.
△ Less
Submitted 8 July, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.