-
Generative Models of 21cm EoR Lightcones with 3D Scattering Transforms
Authors:
Ian Hothi,
Erwan Allys,
Benoit Semelin,
Romain Meriot
Abstract:
The 21cm signal from the Epoch of Reionization (EoR) is observed as a three-dimensional data set known as a lightcone, consisting of a redshift (frequency) axis and two spatial sky plane axes. When observed by radio interferometers, this EoR signal is strongly obscured by foregrounds that are several orders of magnitude stronger. Due to its inherently non-Gaussian nature, the EoR signal requires r…
▽ More
The 21cm signal from the Epoch of Reionization (EoR) is observed as a three-dimensional data set known as a lightcone, consisting of a redshift (frequency) axis and two spatial sky plane axes. When observed by radio interferometers, this EoR signal is strongly obscured by foregrounds that are several orders of magnitude stronger. Due to its inherently non-Gaussian nature, the EoR signal requires robust statistical tools to accurately separate it from these foreground contaminants, but current foreground separation techniques focus primarily on recovering the EoR power spectrum, often neglecting valuable non-Gaussian information. Recent developments in astrophysics, particularly in the context of the Galactic interstellar medium, have demonstrated the efficacy of scattering transforms - novel summary statistics for highly non-Gaussian processes - for component separation tasks. Motivated by these advances, we extend the scattering transform formalism from two-dimensional data sets to three-dimensional EoR lightcones. To this end, we introduce a 3D wavelet set from the tensor product of 2D isotropic wavelets in the sky plane domain and 1D wavelets in the redshift domain. As generative models form the basis of component separation, our focus here is on building and validating generative models that can be used for component separation in future projects. To achieve this, we construct maximum entropy generative models to synthesise EoR lightcones, and statistically validate the generative model by quantitatively comparing the synthesised EoR lightcones with the single target lightcone used to construct them, using independent statistics such as the power spectrum and Minkowski Functionals. The synthesised lightcones agree well with the target lightcone both statistically and visually, opening up the possibility of developing for component separation methods using 3D scattering transforms.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Square Kilometre Array Science Data Challenge 3a: foreground removal for an EoR experiment
Authors:
A. Bonaldi,
P. Hartley,
R. Braun,
S. Purser,
A. Acharya,
K. Ahn,
M. Aparicio Resco,
O. Bait,
M. Bianco,
A. Chakraborty,
E. Chapman,
S. Chatterjee,
K. Chege,
H. Chen,
X. Chen,
Z. Chen,
L. Conaboy,
M. Cruz,
L. Darriba,
M. De Santis,
P. Denzel,
K. Diao,
J. Feron,
C. Finlay,
B. Gehlot
, et al. (159 additional authors not shown)
Abstract:
We present and analyse the results of the Science data challenge 3a (SDC3a, https://sdc3.skao.int/challenges/foregrounds), an EoR foreground-removal community-wide exercise organised by the Square Kilometre Array Observatory (SKAO). The challenge ran for 8 months, from March to October 2023. Participants were provided with realistic simulations of SKA-Low data between 106 MHz and 196 MHz, includin…
▽ More
We present and analyse the results of the Science data challenge 3a (SDC3a, https://sdc3.skao.int/challenges/foregrounds), an EoR foreground-removal community-wide exercise organised by the Square Kilometre Array Observatory (SKAO). The challenge ran for 8 months, from March to October 2023. Participants were provided with realistic simulations of SKA-Low data between 106 MHz and 196 MHz, including foreground contamination from extragalactic as well as Galactic emission, instrumental and systematic effects. They were asked to deliver cylindrical power spectra of the EoR signal, cleaned from all corruptions, and the corresponding confidence levels. Here we describe the approaches taken by the 17 teams that completed the challenge, and we assess their performance using different metrics.
The challenge results provide a positive outlook on the capabilities of current foreground-mitigation approaches to recover the faint EoR signal from SKA-Low observations. The median error committed in the EoR power spectrum recovery is below the true signal for seven teams, although in some cases there are some significant outliers. The smallest residual overall is $4.2_{-4.2}^{+20} \times 10^{-4}\,\rm{K}^2h^{-3}$cMpc$^{3}$ across all considered scales and frequencies.
The estimation of confidence levels provided by the teams is overall less accurate, with the true error being typically under-estimated, sometimes very significantly. The most accurate error bars account for $60 \pm 20$\% of the true errors committed. The challenge results provide a means for all teams to understand and improve their performance. This challenge indicates that the comparison between independent pipelines could be a powerful tool to assess residual biases and improve error estimation.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Combining summary statistics with simulation-based inference for the 21 cm signal from the Epoch of Reionization
Authors:
Benoit Semelin,
Romain Mériot,
Ashutosh Mishra,
David Cornu
Abstract:
The 21 cm signal from the Epoch of Reionization will be observed with the up-coming Square Kilometer Array (SKA). SKA should yield a full tomography of the signal which opens the possibility to explore its non-Gaussian properties. How can we extract the maximum information from the tomography and derive the tightest constraint on the signal? In this work, instead of looking for the most informativ…
▽ More
The 21 cm signal from the Epoch of Reionization will be observed with the up-coming Square Kilometer Array (SKA). SKA should yield a full tomography of the signal which opens the possibility to explore its non-Gaussian properties. How can we extract the maximum information from the tomography and derive the tightest constraint on the signal? In this work, instead of looking for the most informative summary statistics, we investigate how to combine the information from two sets of summary statistics using simulation-based inference. To this purpose, we train Neural Density Estimators (NDE) to fit the implicit likelihood of our model, the LICORICE code, using the Loreli II database. We train three different NDEs: one to perform Bayesian inference on the power spectrum, one to do it on the linear moments of the Pixel Distribution Function (PDF) and one to work with the combination of the two. We perform $\sim 900$ inferences at different points in our parameter space and use them to assess both the validity of our posteriors with Simulation-based Calibration (SBC) and the typical gain obtained by combining summary statistics. We find that our posteriors are biased by no more than $\sim 20 \%$ of their standard deviation and under-confident by no more than $\sim 15 \%$. Then, we establish that combining summary statistics produces a contraction of the 4-D volume of the posterior (derived from the generalized variance) in 91.5 % of our cases, and in 70 to 80 % of the cases for the marginalized 1-D posteriors. The median volume variation is a contraction of a factor of a few for the 4D posteriors and a contraction of 20 to 30 % in the case of the marginalized 1D posteriors. This shows that our approach is a possible alternative to looking for sufficient statistics in the theoretical sense.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Comparison of Bayesian inference methods using the Loreli II database of hydro-radiative simulations of the 21-cm signal
Authors:
Romain Meriot,
Benoit Semelin,
David Cornu
Abstract:
While the observation of the 21 cm signal from the Cosmic Dawn and Epoch of Reionization is an instrumental challenge, the interpretation of a prospective detection is still open to questions regarding the modelling of the signal and the Bayesian inference techniques that bridge the gap between theory and observations. To address some of these questions, we present Loreli II, a database of nearly…
▽ More
While the observation of the 21 cm signal from the Cosmic Dawn and Epoch of Reionization is an instrumental challenge, the interpretation of a prospective detection is still open to questions regarding the modelling of the signal and the Bayesian inference techniques that bridge the gap between theory and observations. To address some of these questions, we present Loreli II, a database of nearly 10 000 simulations of the 21 cm signal run with the Licorice 3D radiative transfer code. With Loreli II, we explore a 5-dimensional astrophysical parameter space where star formation, X-ray emissions, and UV emissions are varied. We then use this database to train neural networks and perform Bayesian inference on 21 cm power spectra affected by thermal noise at the level of 100 hours of observation with the Square Kilometer Array. We study and compare three inference techniques : an emulator of the power spectrum, a Neural Density Estimator that fits the implicit likelihood of the model, and a Bayesian Neural Network that directly fits the posterior distribution. We measure the performances of each method by comparing them on a statistically representative set of inferences, notably using the principles of Simulation-Based Calibration. We report errors on the 1-D marginalized posteriors (biases and over/under confidence) below $15 \%$ of the standard deviation for the emulator and below $25 \%$ for the other methods. We conclude that at our noise level and our sampling density of the parameter space, an explicit Gaussian likelihood is sufficient. This may not be the case at lower noise level or if a denser sampling is used to reach higher accuracy. We then apply the emulator method to recent HERA upper limits and report weak constraints on the X-ray emissivity parameter of our model.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
First upper limits on the 21 cm signal power spectrum from cosmic dawn from one night of observations with NenuFAR
Authors:
S. Munshi,
F. G. Mertens,
L. V. E. Koopmans,
A. R. Offringa,
B. Semelin,
D. Aubert,
R. Barkana,
A. Bracco,
S. A. Brackenhoff,
B. Cecconi,
E. Ceccotti,
S. Corbel,
A. Fialkov,
B. K. Gehlot,
R. Ghara,
J. N. Girard,
J. M. Grießmeier,
C. Höfer,
I. Hothi,
R. Mériot,
M. Mevius,
P. Ocvirk,
A. K. Shaw,
G. Theureau,
S. Yatawatta
, et al. (2 additional authors not shown)
Abstract:
The redshifted 21 cm signal from neutral hydrogen is a direct probe of the physics of the early universe and has been an important science driver of many present and upcoming radio interferometers. In this study we use a single night of observations with the New Extension in Nançay Upgrading LOFAR (NenuFAR) to place upper limits on the 21 cm power spectrum from cosmic dawn at a redshift of $z$ = 2…
▽ More
The redshifted 21 cm signal from neutral hydrogen is a direct probe of the physics of the early universe and has been an important science driver of many present and upcoming radio interferometers. In this study we use a single night of observations with the New Extension in Nançay Upgrading LOFAR (NenuFAR) to place upper limits on the 21 cm power spectrum from cosmic dawn at a redshift of $z$ = 20.3. NenuFAR is a new low-frequency radio interferometer, operating in the 10-85 MHz frequency range, currently under construction at the Nançay Radio Observatory in France. It is a phased array instrument with a very dense uv coverage at short baselines, making it one of the most sensitive instruments for 21 cm cosmology analyses at these frequencies. Our analysis adopts the foreground subtraction approach, in which sky sources are modeled and subtracted through calibration and residual foregrounds are subsequently removed using Gaussian process regression. The final power spectra are constructed from the gridded residual data cubes in the uv plane. Signal injection tests are performed at each step of the analysis pipeline, the relevant pipeline settings are optimized to ensure minimal signal loss, and any signal suppression is accounted for through a bias correction on our final upper limits. We obtain a best 2$σ$ upper limit of $2.4\times 10^7$ $\text{mK}^{2}$ at $z$ = 20.3 and $k$ = 0.041 $h\,\text{cMpc}^{-1}$. We see a strong excess power in the data, making our upper limits two orders of magnitude higher than the thermal noise limit. We investigate the origin and nature of this excess power and discuss further improvements to the analysis pipeline that can potentially mitigate it and consequently allow us to reach thermal noise sensitivity when multiple nights of observations are processed in the future.
△ Less
Submitted 30 April, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
The LoReLi database: 21 cm signal inference with 3D radiative hydrodynamics simulations
Authors:
Romain Meriot,
Benoit Semelin
Abstract:
The Square Kilometer array is expected to measure the 21cm signal from the Epoch of Reionization (EoR) in the coming decade, and its pathfinders may provide a statistical detection even earlier. The currently reported upper limits provide tentative constraints on the astrophysical parameters of the models of the EoR.
In order to interpret such data with 3D radiative hydrodynamics simulations usi…
▽ More
The Square Kilometer array is expected to measure the 21cm signal from the Epoch of Reionization (EoR) in the coming decade, and its pathfinders may provide a statistical detection even earlier. The currently reported upper limits provide tentative constraints on the astrophysical parameters of the models of the EoR.
In order to interpret such data with 3D radiative hydrodynamics simulations using Bayesian inference, we present the latest developments of the \textsc{Licorice} code. Relying on an implementation of the halo conditional mass function to account for unresolved star formation, this code now allows accurate simulations of the EoR at $256^3$ resolution. We use this version of \textsc{Licorice} to produce the first iteration of \textsc{LoReLi}, a public dataset now containing hundreds of 21cm signals computed from radiative hydrodynamics simulations. We train a neural network on \textsc{LoReLi} to provide a fast emulator of the \textsc{Licorice} power spectra, \textsc{LorEMU}, which has $\sim 5\%$ rms error relative to the simulated signals. \textsc{LorEMU} is used in a Markov Chain Monte Carlo framework to perform Bayesian inference, first on a mock observation composed of a simulated signal and thermal noise corresponding to 100h observations with the SKA. We then apply our inference pipeline to the latest measurements from the HERA interferometer. We report constraints on the X-ray emissivity, and confirm that cold reionization scenarios are unlikely to accurately represent our Universe.
△ Less
Submitted 1 December, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Accurate modelling of the Lyman-$α$ coupling for the 21-cm signal, observability with NenuFAR and SKA
Authors:
Benoit Semelin,
Romain Mériot,
Florent Mertens,
Léon V. E. Koopmans,
Dominique Aubert,
Rennan Barkana,
Anastasia Fialkov,
Satyapan Munshi,
Pierre Ocvirk
Abstract:
The measurement of the $21$ cm signal from the Cosmic Dawn is a major goal for several existing and upcoming radio interferometers such as NenuFAR and the SKA. During this era before the beginning of the Epoch of Reionization, the signal is more difficult to observe due to brighter foregrounds but reveals additional information on the underlying astrophysical processes encoded in the spatial fluct…
▽ More
The measurement of the $21$ cm signal from the Cosmic Dawn is a major goal for several existing and upcoming radio interferometers such as NenuFAR and the SKA. During this era before the beginning of the Epoch of Reionization, the signal is more difficult to observe due to brighter foregrounds but reveals additional information on the underlying astrophysical processes encoded in the spatial fluctuations of the spin temperature of hydrogen. To interpret future measurements, controlling the level of accuracy of the Lyman-$α$ flux modelling is mandatory. In this work, we evaluate the impact of various approximations that exist in the main fast modelling approach compared to the results of a costly full radiative transfer simulation. The fast SPINTER code, presented in this work, computes the Lyman-$α$ flux including the effect of wing scatterings for an inhomogeneous emissivity field, but assuming an otherwise homogeneous expanding universe. The LICORICE code computes the full radiative transfer in the Lyman-$α$ line without any substantial approximation. We find that the difference between homogeneous and inhomogeneous gas density and temperature is very small for the computed flux. On the contrary, neglecting the effect of gas velocities produces a significant change in the computed flux. We identify the causes (mainly Doppler shifts due to velocity gradients) and quantify the magnitude of the effect in both an idealised setup and a realistic cosmological situation. We find that the amplitude of the effect, up to a factor of $\sim 2$ on the $21$ cm signal power spectrum on some scales (depending on both other model parameters and the redshift), can be easily discriminated with an SKA-like survey and already be approached, particularly for exotic signals, by the ongoing NenuFAR Cosmic Dawn Key Science Program.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
The Cosmic Mach Number as an Environment Measure for the Underlying Dark Matter Density Field
Authors:
Romain Meriot,
Sadegh Khochfar,
Jose Onorbe,
Britton Smith
Abstract:
Using cosmological dark matter only simulations of a $(1.6$ Gpc$/h)^3$ volume from the Legacy simulation project, we calculate Cosmic Mach Numbers (CMN) and perform a theoretical investigation of their relation with halo properties and features of the density field to gauge their use as an measure of the environment.
CMNs calculated on individual spheres show correlations with both the overdensi…
▽ More
Using cosmological dark matter only simulations of a $(1.6$ Gpc$/h)^3$ volume from the Legacy simulation project, we calculate Cosmic Mach Numbers (CMN) and perform a theoretical investigation of their relation with halo properties and features of the density field to gauge their use as an measure of the environment.
CMNs calculated on individual spheres show correlations with both the overdensity in a region and the density gradient in the direction of the bulk flow around that region. To reduce the scatter around the median of these correlations, we introduce a new measure, the rank ordered Cosmic Mach number ($\hat{\mathcal{M}}_g$), which shows a tight correlations with the overdensity $δ=\frac{ρ-\barρ}{\barρ}$. Measures of the large scale density gradient as well as other average properties of the halo population in a region show tight correlations with $\hat{\mathcal{M}}_g$ as well. Our results in this first empirical study suggest that $\hat{\mathcal{M}}_g$ is an excellent proxy for the underlying density field and hence environment that can circumvent reliance on number density counts in a region. For scales between $10$ and $100 Mpc$/h, Mach numbers calculated using dark matter halos $(> 10^{12}$ M$_{\odot})$ that would typically host massive galaxies are consistent with theoretical predictions of the linear matter power spectrum at a level of $10\%$ due to non-linear effects of gravity. At redshifts $z\geq 3$, these deviations disappear. We also quantify errors due to missing large scale modes in simulations. Simulations of box size $\leq 1 $ Gpc/$h$ typically predict CMNs 10-30\% too small on scales of$\sim 100$ Mpc$/h$.
△ Less
Submitted 13 February, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.