-
Predicting Halo Formation Time Using Machine Learning
Authors:
Atulit Srivastava,
Weiguang Cui,
Daniel de Andres,
Jesse B. Golden-Marx,
Elena Rasia,
Ying Zu
Abstract:
Context:Halo formation time, which quantifies the mass assembly history of dark-matter halos, directly impacts galaxy properties and evolution. Although not directly observable, it can be inferred through proxies like star formation history or galaxy spatial distributions. Recent advances in machine learning enable more accurate predictions of halo formation time using galaxy and halo properties.…
▽ More
Context:Halo formation time, which quantifies the mass assembly history of dark-matter halos, directly impacts galaxy properties and evolution. Although not directly observable, it can be inferred through proxies like star formation history or galaxy spatial distributions. Recent advances in machine learning enable more accurate predictions of halo formation time using galaxy and halo properties.
Aims:This study aims to investigate a machine learning-based approach to predict halo formation time-defined as the epoch when a halo accretes half of its current mass-using both halo and baryonic properties derived from cosmological simulations. By incorporating properties associated with the brightest cluster galaxy located at the cluster center, its associated intracluster light component and satellite galaxies, we aim to surpass these analytical predictions, improve prediction accuracy and identify key properties that can provide the best proxy for the halo assembly history.
Methods:Using The Three Hundred cosmological simulations, we train Random Forest (RF) and Convolutional Neural Network (CNN) models on halo and baryonic properties, such as mass, concentration, stellar and gas masses, and features of the brightest cluster galaxy and intracluster light. CNN models are trained on two-dimensional radial property maps. We also construct simple linear models using only observationally accessible features. Results:RF models show median biases of 4%-9% with standard deviations of 20%. CNN models reduce median bias to <4%, although they have higher scatter. Simple linear models using a limited number of observables achieve prediction accuracy comparable to RF models. Traditional relations between halo formation time and mass/concentration are preserved.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Deep Learning generated observations of galaxy clusters from dark-matter-only simulations
Authors:
Andrés Caro,
Daniel de Andres,
Weiguang Cui,
Gustavo Yepes,
Marco De Petris,
Antonio Ferragamo,
Félicien Schiltz,
Amélie Nef
Abstract:
Hydrodynamical simulations play a fundamental role in modern cosmological research, serving as a crucial bridge between theoretical predictions and observational data. However, due to their computational intensity, these simulations are currently constrained to relatively small volumes. Therefore, this study investigates the feasibility of utilising dark matter-only simulations to generate observa…
▽ More
Hydrodynamical simulations play a fundamental role in modern cosmological research, serving as a crucial bridge between theoretical predictions and observational data. However, due to their computational intensity, these simulations are currently constrained to relatively small volumes. Therefore, this study investigates the feasibility of utilising dark matter-only simulations to generate observable maps of galaxy clusters using a deep learning approach based on the U-Net architecture. We focus on reconstructing Compton-y parameter maps (SZ maps) and bolometric X-ray surface brightness maps (X-ray maps) from total mass density maps. We leverage data from \textsc{The Three Hundred} simulations, selecting galaxy clusters ranging in mass from $10^{13.5} h^{-1}M_{\odot}\leq M_{200} \leq 10^{15.5} h^{-1}M_{\odot}$. Despite the machine learning models being independent of baryonic matter assumptions, a notable limitation is their dependency on the underlying physics of hydrodynamical simulations. To evaluate the reliability of our generated observable maps, we employ various metrics and compare the observable-mass scaling relations. For clusters with masses greater than $2 \times 10^{14} h^{-1} M_{\odot}$, the predictions show excellent agreement with the ground-truth datasets, with percentage errors averaging (0.5 $\pm$ 0.1)\% for the parameters of the scaling laws.
△ Less
Submitted 11 March, 2025; v1 submitted 5 October, 2024;
originally announced October 2024.
-
Generating Galaxy Clusters Mass Density Maps from Mock Multiview Images via Deep Learning
Authors:
Daniel de Andres,
Weiguang Cui,
Gustavo Yepes,
Marco De Petris,
Gianmarco Aversano,
Antonio Ferragamo,
Federico De Luca,
A. Jiménez Muñoz
Abstract:
Galaxy clusters are composed of dark matter, gas and stars. Their dark matter component, which amounts to around 80\% of the total mass, cannot be directly observed but traced by the distribution of diffused gas and galaxy members. In this work, we aim to infer the cluster's projected total mass distribution from mock observational data, i.e. stars, Sunyaev-Zeldovich, and X-ray, by training deep l…
▽ More
Galaxy clusters are composed of dark matter, gas and stars. Their dark matter component, which amounts to around 80\% of the total mass, cannot be directly observed but traced by the distribution of diffused gas and galaxy members. In this work, we aim to infer the cluster's projected total mass distribution from mock observational data, i.e. stars, Sunyaev-Zeldovich, and X-ray, by training deep learning models. To this end, we have created a multiview images dataset from {\sc{The Three Hundred}} simulation that is optimal for training Machine Learning models. We further study deep learning architectures based on the U-Net to account for single-input and multi-input models. We show that the predicted mass distribution agrees well with the true one.
△ Less
Submitted 9 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
3D scaling laws and projection effects in The300-NIKA2 Sunyaev-Zeldovich Large Program Twin Samples
Authors:
A. Paliwal,
W. Cui,
D. de Andrés,
M. De Petris,
A. Ferragamo,
C. Hanser,
J. -F. Macías-Pérez,
F. Mayet,
A. Moyer-Anin,
M. Muñoz-Echeverría,
L. Perotto,
E. Rasia,
G. Yepes
Abstract:
The abundance of galaxy clusters with mass and redshift is a well-known cosmological probe. The cluster mass is a key parameter for studies that aim to constrain cosmological parameters using galaxy clusters, making it critical to understand and properly account for the errors in its estimates. Subsequently, it becomes important to correctly calibrate scaling relations between observables like the…
▽ More
The abundance of galaxy clusters with mass and redshift is a well-known cosmological probe. The cluster mass is a key parameter for studies that aim to constrain cosmological parameters using galaxy clusters, making it critical to understand and properly account for the errors in its estimates. Subsequently, it becomes important to correctly calibrate scaling relations between observables like the integrated Compton parameter and the mass of the cluster.
The NIKA2 Sunyaev-Zeldovich Large program (LPSZ) enables one to map the intracluster medium profiles in the mm-wavelength band with great details (resolution of $11 \ \mathrm{\&}\ 17^{\prime \prime}$ at $1.2 \ \mathrm{\&}\ 2 $ mm, respectively) and hence, to estimate the cluster hydrostatic mass more precisely than previous SZ observations. However, there are certain systematic effects which can only be accounted for with the use of simulations. For this purpose, we employ THE THREE HUNDRED simulations which have been modelled with a range of physics modules to simulate galaxy clusters. The so-called twin samples are constructed by picking synthetic clusters of galaxies with properties close to the observational targets of the LPSZ. In particular, we use the Compton parameter maps and projected total mass maps of these twin samples along 29 different lines of sight. We investigate the scatter that projection induces on the total masses. Eventually, we consider the statistical values along different lines of sight to construct a kind of 3D scaling law between the integrated Compton parameter, total mass, and overdensity of the galaxy clusters to determine the overdensity that is least impacted by the projection effect.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Identifying Galaxy Cluster Mergers with Deep Neural Networks using Idealized Compton-y and X-ray maps
Authors:
Ashleigh R. Arendt,
Yvette C. Perrott,
Ana Contreras-Santos,
Daniel de Andres,
Weiguang Cui,
Douglas Rennehan
Abstract:
We present a novel approach to identify galaxy clusters that are undergoing a merger using a deep learning approach. This paper uses massive galaxy clusters spanning $0 \leq z \leq 2$ from \textsc{The Three Hundred} project, a suite of hydrodynamic re-simulations of 324 large galaxy clusters. Mock, idealised Compton-{\it y} and X-ray maps were constructed for the sample, capturing them out to a ra…
▽ More
We present a novel approach to identify galaxy clusters that are undergoing a merger using a deep learning approach. This paper uses massive galaxy clusters spanning $0 \leq z \leq 2$ from \textsc{The Three Hundred} project, a suite of hydrodynamic re-simulations of 324 large galaxy clusters. Mock, idealised Compton-{\it y} and X-ray maps were constructed for the sample, capturing them out to a radius of $2R_{200}$. The idealised nature of these maps mean they do not consider observational effects such as foreground or background astrophysical objects, any spatial resolution limits or restriction on X-ray energy bands. Half of the maps belong to a merging population as defined by a mass increase $Δ${\it M/M} $\geq$ 0.75, and the other half serve as a control, relaxed population. We employ a convolutional neural network architecture and train the model to classify clusters into one of the groups. A best-performing model was able to correctly distinguish between the two populations with a balanced accuracy (BA) and recall of 0.77, ROC-AUC of 0.85, PR-AUC of 0.55 and $F_{1}$ score of 0.53. Using a multichannel model relative to a single channel model, we obtain a 3\% improvement in BA score, and a 6\% improvement in $F_{1}$ score. We use a saliency interpretation approach to discern the regions most important to each classification decision. By analysing radially binned saliency values we find a preference to utilise regions out to larger distances for mergers with respect to non-mergers, greater than $\sim1.2 R_{200}$ and $\sim0.7 R_{200}$ for SZ and X-ray respectively.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
The Three Hundred Project: Mapping The Matter Distribution in Galaxy Clusters Via Deep Learning from Multiview Simulated Observations
Authors:
Daniel de Andres,
Weiguang Cui,
Gustavo Yepes,
Marco De Petris,
Antonio Ferragamo,
Federico De Luca,
Gianmarco Aversano,
Douglas Rennehan
Abstract:
A galaxy cluster as the most massive gravitationally-bound object in the Universe, is dominated by Dark Matter, which unfortunately can only be investigated through its interaction with the luminous baryons with some simplified assumptions that introduce an un-preferred bias. In this work, we, {\it for the first time}, propose a deep learning method based on the U-Net architecture, to directly inf…
▽ More
A galaxy cluster as the most massive gravitationally-bound object in the Universe, is dominated by Dark Matter, which unfortunately can only be investigated through its interaction with the luminous baryons with some simplified assumptions that introduce an un-preferred bias. In this work, we, {\it for the first time}, propose a deep learning method based on the U-Net architecture, to directly infer the projected total mass density map from idealised observations of simulated galaxy clusters at multi-wavelengths. The model is trained with a large dataset of simulated images from clusters of {\sc The Three Hundred Project}. Although Machine Learning (ML) models do not depend on the assumptions of the dynamics of the intra-cluster medium, our whole method relies on the choice of the physics implemented in the hydrodynamic simulations, which is a limitation of the method. Through different metrics to assess the fidelity of the inferred density map, we show that the predicted total mass distribution is in very good agreement with the true simulated cluster. Therefore, it is not surprising to see the integrated halo mass is almost unbiased, around 1 per cent for the best result from multiview, and the scatter is also very small, basically within 3 per cent. This result suggests that this ML method provides an alternative and more accessible approach to reconstructing the overall matter distribution in galaxy clusters, which can complement the lensing method.
△ Less
Submitted 16 January, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Galaxy cluster mass bias from projected mass maps: The Three Hundred-NIKA2 LPSZ twin samples
Authors:
M. Muñoz-Echeverría,
J. F. Macías-Pérez,
E. Artis,
W. Cui,
D. de Andres,
F. De Luca,
M. De Petris,
A. Ferragamo,
C. Giocoli,
C. Hanser,
F. Mayet,
M. Meneghetti,
A. Moyer-Anin,
A. Paliwal,
L. Perotto,
E. Rasia,
G. Yepes
Abstract:
The determination of the mass of galaxy clusters from observations is subject to systematic uncertainties. Beyond the errors due to instrumental and observational systematic effects, in this work we investigate the bias introduced by modelling assumptions. In particular, we consider the reconstruction of the mass of galaxy clusters from convergence maps employing spherical mass density models. We…
▽ More
The determination of the mass of galaxy clusters from observations is subject to systematic uncertainties. Beyond the errors due to instrumental and observational systematic effects, in this work we investigate the bias introduced by modelling assumptions. In particular, we consider the reconstruction of the mass of galaxy clusters from convergence maps employing spherical mass density models. We made use of The Three Hundred simulations, selecting clusters in the same redshift and mass range as the NIKA2 Sunyaev-Zel'dovich Large Programme sample: $3 \leq M_{500}/ 10^{14} \mathrm{M}_{\odot} \leq 10$ and $0.5 \leq z \leq 0.9$. We studied different modelling and intrinsic uncertainties that should be accounted for when using the single cluster mass estimates for scaling relations. We confirm that the orientation of clusters and the radial ranges considered for the fit have an important impact on the mass bias. The effect of the projection adds uncertainties to the order of $10\%$ to $16\%$ to the mass estimates. We also find that the scatter from cluster to cluster in the mass bias when using spherical mass models is less than $9\%$ of the true mass of the clusters.
△ Less
Submitted 2 December, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
A Deep Learning Approach to Infer Galaxy Cluster Masses from Planck Compton$-y$ parameter maps
Authors:
Daniel de Andres,
Weiguang Cui,
Florian Ruppin,
Marco De Petris,
Gustavo Yepes,
Giulia Gianfagna,
Ichraf Lahouli,
Gianmarco Aversano,
Romain Dupuis,
Mahmoud Jarraya,
Jesús Vega-Ferrero
Abstract:
Galaxy clusters are useful laboratories to investigate the evolution of the Universe, and accurately measuring their total masses allows us to constrain important cosmological parameters. However, estimating mass from observations that use different methods and spectral bands introduces various systematic errors. This paper evaluates the use of a Convolutional Neural Network (CNN) to reliably and…
▽ More
Galaxy clusters are useful laboratories to investigate the evolution of the Universe, and accurately measuring their total masses allows us to constrain important cosmological parameters. However, estimating mass from observations that use different methods and spectral bands introduces various systematic errors. This paper evaluates the use of a Convolutional Neural Network (CNN) to reliably and accurately infer the masses of galaxy clusters from the Compton-y parameter maps provided by the Planck satellite. The CNN is trained with mock images generated from hydrodynamic simulations of galaxy clusters, with Planck's observational limitations taken into account. We observe that the CNN approach is not subject to the usual observational assumptions, and so is not affected by the same biases. By applying the trained CNNs to the real Planck maps, we find cluster masses compatible with Planck measurements within a 15% bias. Finally, we show that this mass bias can be explained by the well known hydrostatic equilibrium assumption in Planck masses, and the different parameters in the Y500-M500 scaling laws. This work highlights that CNNs, supported by hydrodynamic simulations, are a promising and independent tool for estimating cluster masses with high accuracy, which can be extended to other surveys as well as to observations in other bands.
△ Less
Submitted 18 October, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
The Three Hundred project: A Machine Learning method to infer clusters of galaxies mass radial profiles from mock Sunyaev-Zel'dovich maps
Authors:
A. Ferragamo,
D. de Andres,
A. Sbriglio,
W. Cui,
M. De Petris,
G. Yepes,
R. Dupuis,
M. Jarraya,
I. Lahouli,
F. De Luca,
G. Gianfagna,
E. Rasia
Abstract:
We develop a machine learning algorithm to infer the 3D cumulative radial profiles of total and gas mass in galaxy clusters from thermal Sunyaev-Zel'dovich effect maps. We generate around 73,000 mock images along various lines of sight using 2,522 simulated clusters from the \thethreehundred{} project at redshift $z< 0.12$ and train a model that combines an autoencoder and a random forest. Without…
▽ More
We develop a machine learning algorithm to infer the 3D cumulative radial profiles of total and gas mass in galaxy clusters from thermal Sunyaev-Zel'dovich effect maps. We generate around 73,000 mock images along various lines of sight using 2,522 simulated clusters from the \thethreehundred{} project at redshift $z< 0.12$ and train a model that combines an autoencoder and a random forest. Without making any prior assumptions about the hydrostatic equilibrium of the clusters, the model is capable of reconstructing the total mass profile as well as the gas mass profile, which is responsible for the SZ effect. We show that the recovered profiles are unbiased with a scatter of about $10\%$, slightly increasing towards the core and the outskirts of the cluster. We selected clusters in the mass range of $10^{13.5} \leq M_{200} /(\hMsun) \leq 10^{15.5}$, spanning different dynamical states, from relaxed to disturbed halos. We verify that both the accuracy and precision of this method show a slight dependence on the dynamical state, but not on the cluster mass. To further verify the consistency of our model, we fit the inferred total mass profiles with an NFW model and contrast the concentration values with those of the true profiles. We note that the inferred profiles are unbiased for higher concentration values, reproducing a trustworthy mass-concentration relation. The comparison with a widely used mass estimation technique, such as hydrostatic equilibrium, demonstrates that our method recovers the total mass that is not biased by non-thermal motions of the gas.
△ Less
Submitted 1 February, 2023; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Machine Learning methods to estimate observational properties of galaxy clusters in large volume cosmological N-body simulations
Authors:
Daniel de Andres,
Gustavo Yepes,
Federico Sembolini,
Gonzalo Martínez-Muñoz,
Weiguang Cui,
Francisco Robledo,
Chia-Hsun Chuang,
Elena Rasia
Abstract:
In this paper we study the applicability of a set of supervised machine learning (ML) models specifically trained to infer observed related properties of the baryonic component (stars and gas) from a set of features of dark matter only cluster-size halos. The training set is built from THE THREE HUNDRED project which consists of a series of zoomed hydrodynamical simulations of cluster-size regions…
▽ More
In this paper we study the applicability of a set of supervised machine learning (ML) models specifically trained to infer observed related properties of the baryonic component (stars and gas) from a set of features of dark matter only cluster-size halos. The training set is built from THE THREE HUNDRED project which consists of a series of zoomed hydrodynamical simulations of cluster-size regions extracted from the 1 Gpc volume MultiDark dark-matter only simulation (MDPL2). We use as target variables a set of baryonic properties for the intra cluster gas and stars derived from the hydrodynamical simulations and correlate them with the properties of the dark matter halos from the MDPL2 N-body simulation. The different ML models are trained from this database and subsequently used to infer the same baryonic properties for the whole range of cluster-size halos identified in the MDPL2. We also test the robustness of the predictions of the models against mass resolution of the dark matter halos and conclude that their inferred baryonic properties are rather insensitive to their DM properties which are resolved with almost an order of magnitude smaller number of particles. We conclude that the ML models presented in this paper can be used as an accurate and computationally efficient tool for populating cluster-size halos with observational related baryonic properties in large volume N-body simulations making them more valuable for comparison with full sky galaxy cluster surveys at different wavelengths. We make the best ML trained model publicly available.
△ Less
Submitted 10 November, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
\textsc{The Three Hundred} project: The \textsc{Gizmo-Simba} run
Authors:
Weiguang Cui,
Romeel Dave,
Alexander Knebe,
Elena Rasia,
Meghan Gray,
Frazer Pearce,
Chris Power,
Gustavo Yepes,
Dhayaa Anbajagane,
Daniel Ceverino,
Ana Contreras-Santos,
Daniel de Andres,
Marco De Petris,
Stefano Ettori,
Roan Haggar,
Qingyang Li,
Yang Wang,
Xiaohu Yang,
Stefano Borgani,
Klaus Dolag,
Ying Zu,
Ulrike Kuchner,
Rodrigo Cañas,
Antonio Ferragamo,
Giulia Gianfagna
Abstract:
We introduce \textsc{Gizmo-Simba}, a new suite of galaxy cluster simulations within \textsc{The Three Hundred} project. \textsc{The Three Hundred} consists of zoom re-simulations of 324 clusters with $M_{200}\gtrsim 10^{14.8}M_\odot$ drawn from the MultiDark-Planck $N$-body simulation, run using several hydrodynamic and semi-analytic codes. The \textsc{Gizmo-Simba} suite adds a state-of-the-art ga…
▽ More
We introduce \textsc{Gizmo-Simba}, a new suite of galaxy cluster simulations within \textsc{The Three Hundred} project. \textsc{The Three Hundred} consists of zoom re-simulations of 324 clusters with $M_{200}\gtrsim 10^{14.8}M_\odot$ drawn from the MultiDark-Planck $N$-body simulation, run using several hydrodynamic and semi-analytic codes. The \textsc{Gizmo-Simba} suite adds a state-of-the-art galaxy formation model based on the highly successful {\sc Simba} simulation, mildly re-calibrated to match $z=0$ cluster stellar properties. Comparing to \textsc{The Three Hundred} zooms run with \textsc{Gadget-X}, we find intrinsic differences in the evolution of the stellar and gas mass fractions, BCG ages, and galaxy colour-magnitude diagrams, with \textsc{Gizmo-Simba} generally providing a good match to available data at $z \approx 0$. \textsc{Gizmo-Simba}'s unique black hole growth and feedback model yields agreement with the observed BH scaling relations at the intermediate-mass range and predicts a slightly different slope at high masses where few observations currently lie. \textsc{Gizmo-Simba} provides a new and novel platform to elucidate the co-evolution of galaxies, gas, and black holes within the densest cosmic environments.
△ Less
Submitted 31 May, 2022; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Mass Estimation of Planck Galaxy Clusters using Deep Learning
Authors:
Daniel de Andres,
Weiguang Cui,
Florian Ruppin,
Marco De Petris,
Gustavo Yepes,
Ichraf Lahouli,
Gianmarco Aversano,
Romain Dupuis,
Mahmoud Jarraya
Abstract:
Clusters of galaxies mass can be inferred by indirect observations, see X-ray band, Sunyaev-Zeldovich (SZ) effect signal or optical. Unfortunately, all of them are affected by some bias. Alternatively, we provide an independent estimation of the cluster masses from the Planck PLSZ2 catalog of galaxy clusters using a machine-learning method. We train a Convolutional Neural Network (CNN) model with…
▽ More
Clusters of galaxies mass can be inferred by indirect observations, see X-ray band, Sunyaev-Zeldovich (SZ) effect signal or optical. Unfortunately, all of them are affected by some bias. Alternatively, we provide an independent estimation of the cluster masses from the Planck PLSZ2 catalog of galaxy clusters using a machine-learning method. We train a Convolutional Neural Network (CNN) model with the mock SZ observations from The Three Hundred(the300) hydrodynamic simulations to infer the cluster masses from the real maps of the Planck clusters. The advantage of the CNN is that no assumption on a priory symmetry in the cluster's gas distribution or no additional hypothesis about the cluster physical state are made. We compare the cluster masses from the CNN model with those derived by Planck and conclude that the presence of a mass bias is compatible with the simulation results.
△ Less
Submitted 3 December, 2021; v1 submitted 2 November, 2021;
originally announced November 2021.