-
Field-Level Comparison and Robustness Analysis of Cosmological N-Body Simulations
Authors:
Adrian E. Bayer,
Francisco Villaescusa-Navarro,
Sammy Sharief,
Romain Teyssier,
Lehman H. Garrison,
Laurence Perreault-Levasseur,
Greg L. Bryan,
Marco Gatti,
Eli Visbal
Abstract:
We present the first field-level comparison of cosmological N-body simulations, considering various widely used codes: Abacus, CUBEP$^3$M, Enzo, Gadget, Gizmo, PKDGrav, and Ramses. Unlike previous comparisons focused on summary statistics, we conduct a comprehensive field-level analysis: evaluating statistical similarity, quantifying implications for cosmological parameter inference, and identifyi…
▽ More
We present the first field-level comparison of cosmological N-body simulations, considering various widely used codes: Abacus, CUBEP$^3$M, Enzo, Gadget, Gizmo, PKDGrav, and Ramses. Unlike previous comparisons focused on summary statistics, we conduct a comprehensive field-level analysis: evaluating statistical similarity, quantifying implications for cosmological parameter inference, and identifying the regimes in which simulations are consistent. We begin with a traditional comparison using the power spectrum, cross-correlation coefficient, and visual inspection of the matter field. We follow this with a statistical out-of-distribution (OOD) analysis to quantify distributional differences between simulations, revealing insights not captured by the traditional metrics. We then perform field-level simulation-based inference (SBI) using convolutional neural networks (CNNs), training on one simulation and testing on others, including a full hydrodynamic simulation for comparison. We identify several causes of OOD behavior and biased inference, finding that resolution effects, such as those arising from adaptive mesh refinement (AMR), have a significant impact. Models trained on non-AMR simulations fail catastrophically when evaluated on AMR simulations, introducing larger biases than those from hydrodynamic effects. Differences in resolution, even when using the same N-body code, likewise lead to biased inference. We attribute these failures to a CNN's sensitivity to small-scale fluctuations, particularly in voids and filaments, and demonstrate that appropriate smoothing brings the simulations into statistical agreement. Our findings motivate the need for careful data filtering and the use of field-level OOD metrics, such as PQMass, to ensure robust inference.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Interpreting Cosmological Information from Neural Networks in the Hydrodynamic Universe
Authors:
Arnab Lahiry,
Adrian E. Bayer,
Francisco Villaescusa-Navarro
Abstract:
What happens when a black box (neural network) meets a black box (simulation of the Universe)? Recent work has shown that convolutional neural networks (CNNs) can infer cosmological parameters from the matter density field in the presence of complex baryonic processes. A key question that arises is, which parts of the cosmic web is the neural network obtaining information from? We shed light on th…
▽ More
What happens when a black box (neural network) meets a black box (simulation of the Universe)? Recent work has shown that convolutional neural networks (CNNs) can infer cosmological parameters from the matter density field in the presence of complex baryonic processes. A key question that arises is, which parts of the cosmic web is the neural network obtaining information from? We shed light on the matter by identifying the Fourier scales, density scales, and morphological features of the cosmic web that CNNs pay most attention to. We find that CNNs extract cosmological information from both high and low density regions: overdense regions provide the most information per pixel, while underdense regions -- particularly deep voids and their surroundings -- contribute significantly due to their large spatial extent and coherent spatial features. Remarkably, we demonstrate that there is negligible degradation in cosmological constraining power after aggressive cutting in both maximum Fourier scale and density. Furthermore, we find similar results when considering both hydrodynamic and gravity-only simulations, implying that neural networks can marginalize over baryonic effects with minimal loss in cosmological constraining power. Our findings point to practical strategies for optimal and robust field-level cosmological inference in the presence of uncertainly modeled astrophysics.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Initial Conditions from Galaxies: Machine-Learning Subgrid Correction to Standard Reconstruction
Authors:
Liam Parker,
Adrian E. Bayer,
Uros Seljak
Abstract:
We present a hybrid method for reconstructing the primordial density from late-time halos and galaxies. Our approach involves two steps: (1) apply standard Baryon Acoustic Oscillation (BAO) reconstruction to recover the large-scale features in the primordial density field and (2) train a deep learning model to learn small-scale corrections on partitioned subgrids of the full volume. At inference,…
▽ More
We present a hybrid method for reconstructing the primordial density from late-time halos and galaxies. Our approach involves two steps: (1) apply standard Baryon Acoustic Oscillation (BAO) reconstruction to recover the large-scale features in the primordial density field and (2) train a deep learning model to learn small-scale corrections on partitioned subgrids of the full volume. At inference, this correction is then convolved across the full survey volume, enabling scaling to large survey volumes. We train our method on both mock halo catalogs and mock galaxy catalogs in both configuration and redshift space from the Quijote $1(h^{-1}\,\mathrm{Gpc})^3$ simulation suite. When evaluated on held-out simulations, our combined approach significantly improves the reconstruction cross-correlation coefficient with the true initial density field and remains robust to moderate model misspecification. Additionally, we show that models trained on $1(h^{-1}\,\mathrm{Gpc})^3$ can be applied to larger boxes--e.g., $(3h^{-1}\,\mathrm{Gpc})^3$--without retraining. Finally, we perform a Fisher analysis on our method's recovery of the BAO peak, and find that it significantly improves the error on the acoustic scale relative to standard BAO reconstruction. Ultimately, this method robustly captures nonlinearities and bias without sacrificing large-scale accuracy, and its flexibility to handle arbitrarily large volumes without escalating computational requirements makes it especially promising for large-volume surveys like DESI.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
The Power of the Cosmic Web
Authors:
James Sunseri,
Adrian E. Bayer,
Jia Liu
Abstract:
We study the cosmological information contained in the cosmic web, categorized as four structure types: nodes, filaments, walls, and voids, using the Quijote simulations and a modified nexus+ algorithm. We show that splitting the density field by the four structure types and combining the power spectrum in each provides much tighter constraints on cosmological parameters than using the power spec…
▽ More
We study the cosmological information contained in the cosmic web, categorized as four structure types: nodes, filaments, walls, and voids, using the Quijote simulations and a modified nexus+ algorithm. We show that splitting the density field by the four structure types and combining the power spectrum in each provides much tighter constraints on cosmological parameters than using the power spectrum without splitting. We show the rich information contained in the cosmic web structures -- related to the Hessian of the density field -- for measuring all of the cosmological parameters, and in particular for constraining neutrino mass. We study the constraints as a function of Fourier scale, configuration space smoothing scale, and the underlying field. For the matter field with $k_{\rm max}=0.5\,h/{\rm Mpc}$, we find a factor of $\times20$ tighter constraints on neutrino mass when using smoothing scales larger than 12.5~Mpc/$h$, and $\times80$ tighter when using smoothing scales down to 1.95~Mpc/$h$. However, for the CDM+Baryon field we observe a more modest $\times1.7$ or $\times3.6$ improvement, for large and small smoothing scales respectively. We release our new python package for identifying cosmic structures pycosmmommf at https://github.com/James11222/pycosmommf to enable future studies of the cosmological information of the cosmic web.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Massive $ν$s through the CNN lens: interpreting the field-level neutrino mass information in weak lensing
Authors:
Malika Golshan,
Adrian E. Bayer
Abstract:
Modern cosmological surveys probe the Universe deep into the nonlinear regime, where massive neutrinos suppress cosmic structure. Traditional cosmological analyses, which use the 2-point correlation function to extract information, are no longer optimal in the nonlinear regime, and there is thus much interest in extracting beyond-2-point information to improve constraints on neutrino mass. Quantif…
▽ More
Modern cosmological surveys probe the Universe deep into the nonlinear regime, where massive neutrinos suppress cosmic structure. Traditional cosmological analyses, which use the 2-point correlation function to extract information, are no longer optimal in the nonlinear regime, and there is thus much interest in extracting beyond-2-point information to improve constraints on neutrino mass. Quantifying and interpreting the beyond-2-point information is thus a pressing task. We study the field-level information in weak lensing convergence maps using convolution neural networks. We find that the network performance increases as higher source redshifts and smaller scales are considered -- investigating up to a source redshift of 2.5 and $\ell_{\rm max}\simeq10^4$ -- verifying that massive neutrinos leave a distinct effect on weak lensing. However, the performance of the network significantly drops after scaling out the 2-point information from the maps, implying that most of the field-level information can be found in the 2-point correlation function alone. We quantify these findings in terms of the likelihood ratio and also use Integrated Gradient saliency maps to interpret which parts of the map the network is learning the most from. We find that, in the absence of noise, the network extracts a similar amount of information from the most overdense and underdense regions. However, upon adding noise, the information in underdense regions is distorted as noise disproportionately washes out void-like structures.
△ Less
Submitted 11 April, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Simulation-Based Inference Benchmark for LSST Weak Lensing Cosmology
Authors:
Justine Zeghal,
Denise Lanzieri,
François Lanusse,
Alexandre Boucaud,
Gilles Louppe,
Eric Aubourg,
Adrian E. Bayer,
The LSST Dark Energy Science Collaboration
Abstract:
Standard cosmological analysis, which relies on two-point statistics, fails to extract the full information of the data. This limits our ability to constrain with precision cosmological parameters. Thus, recent years have seen a paradigm shift from analytical likelihood-based to simulation-based inference. However, such methods require a large number of costly simulations. We focus on full-field i…
▽ More
Standard cosmological analysis, which relies on two-point statistics, fails to extract the full information of the data. This limits our ability to constrain with precision cosmological parameters. Thus, recent years have seen a paradigm shift from analytical likelihood-based to simulation-based inference. However, such methods require a large number of costly simulations. We focus on full-field inference, considered the optimal form of inference. Our objective is to benchmark several ways of conducting full-field inference to gain insight into the number of simulations required for each method. We make a distinction between explicit and implicit full-field inference. Moreover, as it is crucial for explicit full-field inference to use a differentiable forward model, we aim to discuss the advantages of having this property for the implicit approach. We use the sbi_lens package which provides a fast and differentiable log-normal forward model. This forward model enables us to compare explicit and implicit full-field inference with and without gradient. The former is achieved by sampling the forward model through the No U-Turns sampler. The latter starts by compressing the data into sufficient statistics and uses the Neural Likelihood Estimation algorithm and the one augmented with gradient. We perform a full-field analysis on LSST Y10 like weak lensing simulated mass maps. We show that explicit and implicit full-field inference yield consistent constraints. Explicit inference requires 630 000 simulations with our particular sampler corresponding to 400 independent samples. Implicit inference requires a maximum of 101 000 simulations split into 100 000 simulations to build sufficient statistics (this number is not fine tuned) and 1 000 simulations to perform inference. Additionally, we show that our way of exploiting the gradients does not significantly help implicit inference.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
CHARM: Creating Halos with Auto-Regressive Multi-stage networks
Authors:
Shivam Pandey,
Chirag Modi,
Benjamin D. Wandelt,
Deaglan J. Bartlett,
Adrian E. Bayer,
Greg L. Bryan,
Matthew Ho,
Guilhem Lavaux,
T. Lucas Makinen,
Francisco Villaescusa-Navarro
Abstract:
To maximize the amount of information extracted from cosmological datasets, simulations that accurately represent these observations are necessary. However, traditional simulations that evolve particles under gravity by estimating particle-particle interactions (N-body simulations) are computationally expensive and prohibitive to scale to the large volumes and resolutions necessary for the upcomin…
▽ More
To maximize the amount of information extracted from cosmological datasets, simulations that accurately represent these observations are necessary. However, traditional simulations that evolve particles under gravity by estimating particle-particle interactions (N-body simulations) are computationally expensive and prohibitive to scale to the large volumes and resolutions necessary for the upcoming datasets. Moreover, modeling the distribution of galaxies typically involves identifying virialized dark matter halos, which is also a time- and memory-consuming process for large N-body simulations, further exacerbating the computational cost. In this study, we introduce CHARM, a novel method for creating mock halo catalogs by matching the spatial, mass, and velocity statistics of halos directly from the large-scale distribution of the dark matter density field. We develop multi-stage neural spline flow-based networks to learn this mapping at redshift z=0.5 directly with computationally cheaper low-resolution particle mesh simulations instead of relying on the high-resolution N-body simulations. We show that the mock halo catalogs and painted galaxy catalogs have the same statistical properties as obtained from $N$-body simulations in both real space and redshift space. Finally, we use these mock catalogs for cosmological inference using redshift-space galaxy power spectrum, bispectrum, and wavelet-based statistics using simulation-based inference, performing the first inference with accelerated forward model simulations and finding unbiased cosmological constraints with well-calibrated posteriors. The code was developed as part of the Simons Collaboration on Learning the Universe and is publicly available at \url{https://github.com/shivampcosmo/CHARM}.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Periodicity significance testing with null-signal templates: reassessment of PTF's SMBH binary candidates
Authors:
Jakob Robnik,
Adrian E. Bayer,
Maria Charisi,
Zoltán Haiman,
Allison Lin,
Uroš Seljak
Abstract:
Periodograms are widely employed for identifying periodicity in time series data, yet they often struggle to accurately quantify the statistical significance of detected periodic signals when the data complexity precludes reliable simulations. We develop a data-driven approach to address this challenge by introducing a null-signal template (NST). The NST is created by carefully randomizing the per…
▽ More
Periodograms are widely employed for identifying periodicity in time series data, yet they often struggle to accurately quantify the statistical significance of detected periodic signals when the data complexity precludes reliable simulations. We develop a data-driven approach to address this challenge by introducing a null-signal template (NST). The NST is created by carefully randomizing the period of each cycle in the periodogram template, rendering it non-periodic. It has the same frequentist properties as a periodic signal template regardless of the noise probability distribution, and we show with simulations that the distribution of false positives is the same as with the original periodic template, regardless of the underlying data. Thus, performing a periodicity search with the NST acts as an effective simulation of the null (no-signal) hypothesis, without having to simulate the noise properties of the data. We apply the NST method to the supermassive black hole binaries (SMBHB) search in the Palomar Transient Factory (PTF), where Charisi et al. had previously proposed 33 high signal to (white) noise candidates utilizing simulations to quantify their significance. Our approach reveals that these simulations do not capture the complexity of the real data. There are no statistically significant periodic signal detections above the non-periodic background. To improve the search sensitivity we introduce a Gaussian quadrature based algorithm for the Bayes Factor with correlated noise as a test statistic, in contrast to the standard signal to white noise. We show with simulations that this improves sensitivity to true signals by more than an order of magnitude. However, using the Bayes Factor approach also results in no statistically significant detections in the PTF data.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
The HalfDome Multi-Survey Cosmological Simulations: N-body Simulations
Authors:
Adrian E. Bayer,
Yici Zhong,
Zack Li,
Joseph DeRose,
Yu Feng,
Jia Liu
Abstract:
Upcoming cosmological surveys have the potential to reach groundbreaking discoveries on multiple fronts, including the neutrino mass, dark energy, and inflation. Most of the key science goals require the joint analysis of datasets from multiple surveys to break parameter degeneracies and calibrate systematics. To realize such analyses, a large set of mock simulations that realistically model corre…
▽ More
Upcoming cosmological surveys have the potential to reach groundbreaking discoveries on multiple fronts, including the neutrino mass, dark energy, and inflation. Most of the key science goals require the joint analysis of datasets from multiple surveys to break parameter degeneracies and calibrate systematics. To realize such analyses, a large set of mock simulations that realistically model correlated observables is required. In this paper we present the N-body component of the HalfDome cosmological simulations, designed for the joint analysis of Stage-IV cosmological surveys, such as Rubin LSST, Euclid, SPHEREx, Roman, DESI, PFS, Simons Observatory, CMB-S4, and LiteBIRD. Our 300TB initial data release includes full-sky lightcones and halo catalogs between $z$=0--4 for 11 fixed cosmology realizations, as well as an additional run with local primordial non-Gaussianity ($f_{\rm NL}$=20). The simulations evolve $6144^3$ particles in a 3.75$\,h^{-1} {\rm Gpc}$ box, reaching a minimum halo mass of $\sim 6 \times 10^{12}\,h^{-1} M_\odot$ and maximum scale of $k \sim 1\,h{\rm Mpc}^{-1}$. Our data is publicly available: instructions to access the data and plans for future data releases can be found at https://halfdomesims.github.io.
△ Less
Submitted 9 May, 2025; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Significance of void shape: Neutrino mass from Voronoi void halos?
Authors:
Adrian E. Bayer,
Jia Liu,
Christina D. Kreisch,
Alice Pisani
Abstract:
Massive neutrinos suppress the growth of cosmic structure on nonlinear scales, motivating the use of information beyond the power spectrum to tighten constraints on the neutrino mass, for example by considering cosmic voids. It was recently proposed that constraints on neutrino mass from the halo mass function (HMF) can be improved by considering only the halos that reside within voids -- the void…
▽ More
Massive neutrinos suppress the growth of cosmic structure on nonlinear scales, motivating the use of information beyond the power spectrum to tighten constraints on the neutrino mass, for example by considering cosmic voids. It was recently proposed that constraints on neutrino mass from the halo mass function (HMF) can be improved by considering only the halos that reside within voids -- the void-halo mass function (VHMF). We extend this analysis, which made spherical assumptions about the shape of voids, to take into account the non-spherical nature of voids as defined by the Voronoi-tessellation-based void finder, VIDE. In turn, after accounting for one spurious non-spherical void, we find no evidence that the VHMF contains information beyond the HMF. Given this finding, we then introduce a novel summary statistic by splitting halos according to the emptiness of their individual environments, defined by the Voronoi cell volume each halo resides in, and combining the mass functions from each split. We name the corresponding statistic the VorHMF and find that it could provide information regarding neutrino mass beyond the HMF. Our work thus motivates the importance of accounting for the full shape of voids in future analyses, both in terms of removing outliers to achieve robust results and as an additional source of cosmological information.
△ Less
Submitted 2 October, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Field-Level Inference with Microcanonical Langevin Monte Carlo
Authors:
Adrian E. Bayer,
Uros Seljak,
Chirag Modi
Abstract:
Field-level inference provides a means to optimally extract information from upcoming cosmological surveys, but requires efficient sampling of a high-dimensional parameter space. This work applies Microcanonical Langevin Monte Carlo (MCLMC) to sample the initial conditions of the Universe, as well as the cosmological parameters $σ_8$ and $Ω_m$, from simulations of cosmic structure. MCLMC is shown…
▽ More
Field-level inference provides a means to optimally extract information from upcoming cosmological surveys, but requires efficient sampling of a high-dimensional parameter space. This work applies Microcanonical Langevin Monte Carlo (MCLMC) to sample the initial conditions of the Universe, as well as the cosmological parameters $σ_8$ and $Ω_m$, from simulations of cosmic structure. MCLMC is shown to be over an order of magnitude more efficient than traditional Hamiltonian Monte Carlo (HMC) for a $\sim 2.6 \times 10^5$ dimensional problem. Moreover, the efficiency of MCLMC compared to HMC greatly increases as the dimensionality increases, suggesting gains of many orders of magnitude for the dimensionalities required by upcoming cosmological surveys.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Joint velocity and density reconstruction of the Universe with nonlinear differentiable forward modeling
Authors:
Adrian E. Bayer,
Chirag Modi,
Simone Ferraro
Abstract:
Reconstructing the initial conditions of the Universe from late-time observations has the potential to optimally extract cosmological information. Due to the high dimensionality of the parameter space, a differentiable forward model is needed for convergence, and recent advances have made it possible to perform reconstruction with nonlinear models based on galaxy (or halo) positions. In addition t…
▽ More
Reconstructing the initial conditions of the Universe from late-time observations has the potential to optimally extract cosmological information. Due to the high dimensionality of the parameter space, a differentiable forward model is needed for convergence, and recent advances have made it possible to perform reconstruction with nonlinear models based on galaxy (or halo) positions. In addition to positions, future surveys will provide measurements of galaxies' peculiar velocities through the kinematic Sunyaev-Zel'dovich effect (kSZ), type Ia supernovae, and the fundamental plane or Tully-Fisher relations. Here we develop the formalism for including halo velocities, in addition to halo positions, to enhance the reconstruction of the initial conditions. We show that using velocity information can significantly improve the reconstruction accuracy compared to using only the halo density field. We study this improvement as a function of shot noise, velocity measurement noise, and angle to the line of sight. We also show how halo velocity data can be used to improve the reconstruction of the final nonlinear matter overdensity and velocity fields. We have built our pipeline into the differentiable Particle-Mesh FlowPM package, paving the way to perform field-level cosmological inference with joint velocity and density reconstruction. This is especially useful given the increased ability to measure peculiar velocities in the near future.
△ Less
Submitted 17 July, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Super-sample covariance of the power spectrum, bispectrum, halos, voids, and their cross covariances
Authors:
Adrian E. Bayer,
Jia Liu,
Ryo Terasawa,
Alexandre Barreira,
Yici Zhong,
Yu Feng
Abstract:
We study the effect of super-sample covariance (SSC) on the power spectrum and higher-order statistics: bispectrum, halo mass function, and void size function. We also investigate the effect of SSC on the cross covariance between the statistics. We consider both the matter and halo fields. Higher-order statistics of the large-scale structure contain additional cosmological information beyond the p…
▽ More
We study the effect of super-sample covariance (SSC) on the power spectrum and higher-order statistics: bispectrum, halo mass function, and void size function. We also investigate the effect of SSC on the cross covariance between the statistics. We consider both the matter and halo fields. Higher-order statistics of the large-scale structure contain additional cosmological information beyond the power spectrum and are a powerful tool to constrain cosmology. They are a promising probe for ongoing and upcoming high precision cosmological surveys such as DESI, PFS, Rubin Observatory LSST, Euclid, SPHEREx, SKA, and Roman Space Telescope. Cosmological simulations used in modeling and validating these statistics often have sizes that are much smaller than the observed Universe. Density fluctuations on scales larger than the simulation box, known as super-sample modes, are not captured by the simulations and in turn can lead to inaccuracies in the covariance matrix. We compare the covariance measured using simulation boxes containing super-sample modes to those without. We also compare with the Separate Universe approach. We find that while the power spectrum, bispectrum and halo mass function show significant scale- or mass-dependent SSC, the void size function shows relatively small SSC. We also find significant SSC contributions to the cross covariances between the different statistics, implying that future joint-analyses will need to carefully take into consideration the effect of SSC. To enable further study of SSC, our simulations have been made publicly available at https://github.com/HalfDomeSims/ssc.
△ Less
Submitted 16 August, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
The DESI $N$-body Simulation Project -- II. Suppressing sample variance with fast simulations
Authors:
Zhejie Ding,
Chia-Hsun Chuang,
Yu Yu,
Lehman H. Garrison,
Adrian E. Bayer,
Yu Feng,
Chirag Modi,
Daniel J. Eisenstein,
Martin White,
Andrei Variu,
Cheng Zhao,
Hanyu Zhang,
Jennifer Meneses Rizo,
David Brooks,
Kyle Dawson,
Peter Doel,
Enrique Gaztanaga,
Robert Kehoe,
Alex Krolewski,
Martin Landriau,
Nathalie Palanque-Delabrouille,
Claire Poppett
Abstract:
Dark Energy Spectroscopic Instrument (DESI) will construct a large and precise three-dimensional map of our Universe. The survey effective volume reaches $\sim20\Gpchcube$. It is a great challenge to prepare high-resolution simulations with a much larger volume for validating the DESI analysis pipelines. \textsc{AbacusSummit} is a suite of high-resolution dark-matter-only simulations designed for…
▽ More
Dark Energy Spectroscopic Instrument (DESI) will construct a large and precise three-dimensional map of our Universe. The survey effective volume reaches $\sim20\Gpchcube$. It is a great challenge to prepare high-resolution simulations with a much larger volume for validating the DESI analysis pipelines. \textsc{AbacusSummit} is a suite of high-resolution dark-matter-only simulations designed for this purpose, with $200\Gpchcube$ (10 times DESI volume) for the base cosmology. However, further efforts need to be done to provide a more precise analysis of the data and to cover also other cosmologies. Recently, the CARPool method was proposed to use paired accurate and approximate simulations to achieve high statistical precision with a limited number of high-resolution simulations. Relying on this technique, we propose to use fast quasi-$N$-body solvers combined with accurate simulations to produce accurate summary statistics. This enables us to obtain 100 times smaller variance than the expected DESI statistical variance at the scales we are interested in, e.g. $k < 0.3\hMpc$ for the halo power spectrum. In addition, it can significantly suppress the sample variance of the halo bispectrum. We further generalize the method for other cosmologies with only one realization in \textsc{AbacusSummit} suite to extend the effective volume $\sim 20$ times. In summary, our proposed strategy of combining high-fidelity simulations with fast approximate gravity solvers and a series of variance suppression techniques sets the path for a robust cosmological analysis of galaxy survey data.
△ Less
Submitted 18 June, 2022; v1 submitted 12 February, 2022;
originally announced February 2022.
-
Self-Calibrating the Look-Elsewhere Effect: Fast Evaluation of the Statistical Significance Using Peak Heights
Authors:
Adrian E. Bayer,
Uros Seljak,
Jakob Robnik
Abstract:
In experiments where one searches a large parameter space for an anomaly, one often finds many spurious noise-induced peaks in the likelihood. This is known as the look-elsewhere effect, and must be corrected for when performing statistical analysis. This paper introduces a method to calibrate the false alarm probability (FAP), or $p$-value, for a given dataset by considering the heights of the hi…
▽ More
In experiments where one searches a large parameter space for an anomaly, one often finds many spurious noise-induced peaks in the likelihood. This is known as the look-elsewhere effect, and must be corrected for when performing statistical analysis. This paper introduces a method to calibrate the false alarm probability (FAP), or $p$-value, for a given dataset by considering the heights of the highest peaks in the likelihood. In the simplest form of self-calibration, the look-elsewhere-corrected $χ^2$ of a physical peak is approximated by the $χ^2$ of the peak minus the $χ^2$ of the highest noise-induced peak. Generalizing this concept to consider lower peaks provides a fast method to quantify the statistical significance with improved accuracy. In contrast to alternative methods, this approach has negligible computational cost as peaks in the likelihood are a byproduct of every peak-search analysis. We apply to examples from astronomy, including planet detection, periodograms, and cosmology.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Beware of Fake $ν$s: The Effect of Massive Neutrinos on the Nonlinear Evolution of Cosmic Structure
Authors:
Adrian E. Bayer,
Arka Banerjee,
Uros Seljak
Abstract:
Massive neutrinos suppress the growth of cosmic structure on small, non-linear, scales. It is thus often proposed that using statistics beyond the power spectrum can tighten constraints on the neutrino mass by extracting additional information from these non-linear scales. We study the information content regarding neutrino mass at the field level, quantifying how much of this information arises f…
▽ More
Massive neutrinos suppress the growth of cosmic structure on small, non-linear, scales. It is thus often proposed that using statistics beyond the power spectrum can tighten constraints on the neutrino mass by extracting additional information from these non-linear scales. We study the information content regarding neutrino mass at the field level, quantifying how much of this information arises from the difference in non-linear evolution between a cosmology with 1 fluid (CDM) and 2 fluids (CDM + neutrinos). We do so by running two $N$-body simulations, one with and one without massive neutrinos; both with the same phases, and matching their linear power spectrum at a given, low, redshift. This effectively isolates the information encoded in the linear initial conditions from the non-linear cosmic evolution. We demonstrate that for $k \lesssim 1\,h/{\rm Mpc}$, and for a single redshift, there is negligible difference in the real-space CDM field between the two simulations. This suggests that all the information regarding neutrino mass is in the linear power spectrum set by the initial conditions. Thus any probe based on the CDM field alone will have negligible constraining power beyond that which exists at the linear level over the same range of scales. Consequently, any probe based on the halo field will contain little information beyond the linear power. We find similar results for the matter field responsible for weak lensing. We also demonstrate that there may be much information beyond the power spectrum in the 3d matter field, however, this is not observable in modern surveys via dark matter halos or weak lensing. Finally, we show that there is additional information to be found in redshift space.
△ Less
Submitted 15 June, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
The GIGANTES dataset: precision cosmology from voids in the machine learning era
Authors:
Christina D. Kreisch,
Alice Pisani,
Francisco Villaescusa-Navarro,
David N. Spergel,
Benjamin D. Wandelt,
Nico Hamaus,
Adrian E. Bayer
Abstract:
We present GIGANTES, the most extensive and realistic void catalog suite ever released -- containing over 1 billion cosmic voids covering a volume larger than the observable Universe, more than 20 TB of data, and created by running the void finder VIDE on QUIJOTE's halo simulations. The expansive and detailed GIGANTES suite, spanning thousands of cosmological models, opens up the study of voids, a…
▽ More
We present GIGANTES, the most extensive and realistic void catalog suite ever released -- containing over 1 billion cosmic voids covering a volume larger than the observable Universe, more than 20 TB of data, and created by running the void finder VIDE on QUIJOTE's halo simulations. The expansive and detailed GIGANTES suite, spanning thousands of cosmological models, opens up the study of voids, answering compelling questions: Do voids carry unique cosmological information? How is this information correlated with galaxy information? Leveraging the large number of voids in the GIGANTES suite, our Fisher constraints demonstrate voids contain additional information, critically tightening constraints on cosmological parameters. We use traditional void summary statistics (void size function, void density profile) and the void auto-correlation function, which independently yields an error of $0.13\,\mathrm{eV}$ on $\sum\,m_ν$ for a 1 $h^{-3}\mathrm{Gpc}^3$ simulation, without CMB priors. Combining halos and voids we forecast an error of $0.09\,\mathrm{eV}$ from the same volume. Extrapolating to next generation multi-Gpc$^3$ surveys such as DESI, Euclid, SPHEREx, and the Roman Space Telescope, we expect voids should yield an independent determination of neutrino mass. Crucially, GIGANTES is the first void catalog suite expressly built for intensive machine learning exploration. We illustrate this by training a neural network to perform likelihood-free inference on the void size function. Cosmology problems provide an impetus to develop novel deep learning techniques, leveraging the symmetries embedded throughout the universe from physical laws, interpreting models, and accurately predicting errors. With GIGANTES, machine learning gains an impressive dataset, offering unique problems that will stimulate new techniques.
△ Less
Submitted 22 July, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Detecting Neutrino Mass by Combining Matter Clustering, Halos, and Voids
Authors:
Adrian E. Bayer,
Francisco Villaescusa-Navarro,
Elena Massara,
Jia Liu,
David N. Spergel,
Licia Verde,
Benjamin D. Wandelt,
Matteo Viel,
Shirley Ho
Abstract:
We quantify the information content of the non-linear matter power spectrum, the halo mass function, and the void size function, using the Quijote $N$-body simulations. We find that these three statistics exhibit very different degeneracies amongst the cosmological parameters, and thus the combination of all three probes enables the breaking of degeneracies, in turn yielding remarkably tight const…
▽ More
We quantify the information content of the non-linear matter power spectrum, the halo mass function, and the void size function, using the Quijote $N$-body simulations. We find that these three statistics exhibit very different degeneracies amongst the cosmological parameters, and thus the combination of all three probes enables the breaking of degeneracies, in turn yielding remarkably tight constraints. We perform a Fisher analysis using the full covariance matrix, including all auto- and cross-correlations, finding that this increases the information content for neutrino mass compared to a correlation-free analysis. The multiplicative improvement of the constraints on the cosmological parameters obtained by combining all three probes compared to using the power spectrum alone are: 137, 5, 8, 20, 10, and 43, for $Ω_m$, $Ω_b$, $h$, $n_s$, $σ_8$, and $M_ν$, respectively. The marginalized error on the sum of the neutrino masses is $σ(M_ν)=0.018\,{\rm eV}$ for a cosmological volume of $1\,(h^{-1}{\rm Gpc})^3$, using $k_{\max}=0.5\,h{\rm Mpc}^{-1}$, and without CMB priors. We note that this error is an underestimate insomuch as we do not consider super-sample covariance, baryonic effects, and realistic survey noises and systematics. On the other hand, it is an overestimate insomuch as our cuts and binning are suboptimal due to restrictions imposed by the simulation resolution. Given upcoming galaxy surveys will observe volumes spanning $\sim 100\,(h^{-1}{\rm Gpc})^3$, this presents a promising new avenue to measure neutrino mass without being restricted by the need for accurate knowledge of the optical depth, which is required for CMB-based measurements. Furthermore, the improved constraints on other cosmological parameters, notably $Ω_m$, may also be competitive with CMB-based measurements.
△ Less
Submitted 20 September, 2021; v1 submitted 9 February, 2021;
originally announced February 2021.
-
The look-elsewhere effect from a unified Bayesian and frequentist perspective
Authors:
Adrian E. Bayer,
Uros Seljak
Abstract:
When searching over a large parameter space for anomalies such as events, peaks, objects, or particles, there is a large probability that spurious signals with seemingly high significance will be found. This is known as the look-elsewhere effect and is prevalent throughout cosmology, (astro)particle physics, and beyond. To avoid making false claims of detection, one must account for this effect wh…
▽ More
When searching over a large parameter space for anomalies such as events, peaks, objects, or particles, there is a large probability that spurious signals with seemingly high significance will be found. This is known as the look-elsewhere effect and is prevalent throughout cosmology, (astro)particle physics, and beyond. To avoid making false claims of detection, one must account for this effect when assigning the statistical significance of an anomaly. This is typically accomplished by considering the trials factor, which is generally computed numerically via potentially expensive simulations. In this paper we develop a continuous generalization of the Bonferroni and Sidak corrections by applying the Laplace approximation to evaluate the Bayes factor, and in turn relating the trials factor to the prior-to-posterior volume ratio. We use this to define a test statistic whose frequentist properties have a simple interpretation in terms of the global $p$-value, or statistical significance. We apply this method to various physics-based examples and show it to work well for the full range of $p$-values, i.e. in both the asymptotic and non-asymptotic regimes. We also show that this method naturally accounts for other model complexities such as additional degrees of freedom, generalizing Wilks' theorem. This provides a fast way to quantify statistical significance in light of the look-elsewhere effect, without resorting to expensive simulations.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
A fast particle-mesh simulation of non-linear cosmological structure formation with massive neutrinos
Authors:
Adrian E. Bayer,
Arka Banerjee,
Yu Feng
Abstract:
Quasi-N-body simulations, such as FastPM, provide a fast way to simulate cosmological structure formation, but have yet to adequately include the effects of massive neutrinos. We present a method to include neutrino particles in FastPM, enabling computation of the CDM and total matter power spectra to percent-level accuracy in the non-linear regime. The CDM-neutrino cross-power can also be compute…
▽ More
Quasi-N-body simulations, such as FastPM, provide a fast way to simulate cosmological structure formation, but have yet to adequately include the effects of massive neutrinos. We present a method to include neutrino particles in FastPM, enabling computation of the CDM and total matter power spectra to percent-level accuracy in the non-linear regime. The CDM-neutrino cross-power can also be computed at a sufficient accuracy to constrain cosmological observables. To avoid the shot noise that typically plagues neutrino particle simulations, we employ a quasi-random algorithm to sample the relevant Fermi-Dirac distribution when setting the initial neutrino thermal velocities. We additionally develop an effective distribution function to describe a set of non-degenerate neutrinos as a single particle to speed up non-degenerate simulations. The simulation is accurate for the full range of physical interest, $M_ν\lesssim 0.6$eV, and applicable to redshifts $z\lesssim2$. Such accuracy can be achieved by initializing particles with the two-fluid approximation transfer functions (using the REPS package). Convergence can be reached in $\sim 25$ steps, with a starting redshift of $z=99$. Probing progressively smaller scales only requires an increase in the number of CDM particles being simulated, while the number of neutrino particles can remain fixed at a value less than or similar to the number of CDM particles. In turn, the percentage increase in runtime-per-step due to neutrino particles is between $\sim 5-20\%$ for runs with $1024^3$ CDM particles, and decreases as the number of CDM particles is increased. The code has been made publicly available, providing an invaluable resource to produce fast predictions for cosmological surveys and studying reconstruction.
△ Less
Submitted 14 January, 2021; v1 submitted 27 July, 2020;
originally announced July 2020.