-
Neural Posterior Estimation for Cataloging Astronomical Images with Spatially Varying Backgrounds and Point Spread Functions
Authors:
Aakash Patel,
Tianqing Zhang,
Camille Avestruz,
Jeffrey Regier,
the LSST Dark Energy Science Collaboration
Abstract:
Neural posterior estimation (NPE), a type of amortized variational inference, is a computationally efficient means of constructing probabilistic catalogs of light sources from astronomical images. To date, NPE has not been used to perform inference in models with spatially varying covariates. However, ground-based astronomical images have spatially varying sky backgrounds and point spread function…
▽ More
Neural posterior estimation (NPE), a type of amortized variational inference, is a computationally efficient means of constructing probabilistic catalogs of light sources from astronomical images. To date, NPE has not been used to perform inference in models with spatially varying covariates. However, ground-based astronomical images have spatially varying sky backgrounds and point spread functions (PSFs), and accounting for this variation is essential for constructing accurate catalogs of imaged light sources. In this work, we introduce a method of performing NPE with spatially varying backgrounds and PSFs. In this method, we generate synthetic catalogs and semi-synthetic images for these catalogs using randomly sampled PSF and background estimates from existing surveys. Using this data, we train a neural network, which takes an astronomical image and representations of its background and PSF as input, to output a probabilistic catalog. Our experiments with Sloan Digital Sky Survey data demonstrate the effectiveness of NPE in the presence of spatially varying backgrounds and PSFs for light source detection, star/galaxy separation, and flux measurement.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data
Authors:
Mulugeta Weldezgina Asres,
Christian Walter Omlin,
The CMS-HCAL Collaboration
Abstract:
Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a more extensive set of monitoring variables across multiple subsystems. However, learning causal graphs comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-sc…
▽ More
Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a more extensive set of monitoring variables across multiple subsystems. However, learning causal graphs comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments. In addition, modern monitoring applications for large systems often generate large amounts of binary alarm flags, and the distinct characteristics of binary anomaly data -- the meaning of state transition and data sparsity -- challenge existing causality learning mechanisms. This study proposes an anomaly causal discovery approach (AnomalyCD), addressing the accuracy and computational challenges of generating causal graphs from binary flag data sets. The AnomalyCD framework presents several strategies, such as anomaly flag characteristics incorporating causality testing, sparse data and link compression, and edge pruning adjustment approaches. We validate the performance of this framework on two datasets: monitoring sensor data of the readout-box system of the Compact Muon Solenoid experiment at CERN, and a public data set for information technology monitoring. The results demonstrate the considerable reduction of the computation overhead and moderate enhancement of the accuracy of temporal causal discovery on binary anomaly data sets.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Euclid preparation. LIII. LensMC, weak lensing cosmic shear measurement with forward modelling and Markov Chain Monte Carlo sampling
Authors:
Euclid Collaboration,
G. Congedo,
L. Miller,
A. N. Taylor,
N. Cross,
C. A. J. Duncan,
T. Kitching,
N. Martinet,
S. Matthew,
T. Schrabback,
M. Tewes,
N. Welikala,
N. Aghanim,
A. Amara,
S. Andreon,
N. Auricchio,
M. Baldi,
S. Bardelli,
R. Bender,
C. Bodendorf,
D. Bonino,
E. Branchini,
M. Brescia,
J. Brinchmann,
S. Camera
, et al. (217 additional authors not shown)
Abstract:
LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling in order to deal with convolution by a point spread function (PSF) with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies…
▽ More
LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling in order to deal with convolution by a point spread function (PSF) with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies observed by Euclid. We quantified the scientific performance through high-fidelity images based on the Euclid Flagship simulations and emulation of the Euclid VIS images; realistic clustering with a mean surface number density of 250 arcmin$^{-2}$ ($I_{\rm E}<29.5$) for galaxies, and 6 arcmin$^{-2}$ ($I_{\rm E}<26$) for stars; and a diffraction-limited chromatic PSF with a full width at half maximum of $0.^{\!\prime\prime}2$ and spatial variation across the field of view. LensMC measured objects with a density of 90 arcmin$^{-2}$ ($I_{\rm E}<26.5$) in 4500 deg$^2$. The total shear bias was broken down into measurement (our main focus here) and selection effects (which will be addressed elsewhere). We found measurement multiplicative and additive biases of $m_1=(-3.6\pm0.2)\times10^{-3}$, $m_2=(-4.3\pm0.2)\times10^{-3}$, $c_1=(-1.78\pm0.03)\times10^{-4}$, $c_2=(0.09\pm0.03)\times10^{-4}$; a large detection bias with a multiplicative component of $1.2\times10^{-2}$ and an additive component of $-3\times10^{-4}$; and a measurement PSF leakage of $α_1=(-9\pm3)\times10^{-4}$ and $α_2=(2\pm3)\times10^{-4}$. When model bias is suppressed, the obtained measurement biases are close to Euclid requirement and largely dominated by undetected faint galaxies ($-5\times10^{-3}$). Although significant, model bias will be straightforward to calibrate given the weak sensitivity. LensMC is publicly available at https://gitlab.com/gcongedo/LensMC
△ Less
Submitted 2 December, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Semi-Supervised Record Linkage for Construction of Large-Scale Sociocentric Networks in Resource-limited Settings: An application to the SEARCH Study in Rural Uganda and Kenya
Authors:
Yiqun Chen,
Wenjing Zheng,
Lillian B. Brown,
Gabriel Chamie,
Dalsone Kwarisiima,
Jane Kabami,
Tamara D. Clark,
Norton Sang,
James Ayieko,
Edwin D. Charlebois,
Vivek Jain,
Laura Balzer,
Moses R Kamya,
Diane Havlir,
Maya Petersen,
the SEARCH Collaboration
Abstract:
This paper presents a novel semi-supervised algorithmic approach to creating large scale sociocentric networks in rural East Africa. We describe the construction of 32 large-scale sociocentric social networks in rural Sub-Saharan Africa. Networks were constructed by applying a semi-supervised record-linkage algorithm to data from census-enumerated residents of the 32 communities included in the SE…
▽ More
This paper presents a novel semi-supervised algorithmic approach to creating large scale sociocentric networks in rural East Africa. We describe the construction of 32 large-scale sociocentric social networks in rural Sub-Saharan Africa. Networks were constructed by applying a semi-supervised record-linkage algorithm to data from census-enumerated residents of the 32 communities included in the SEARCH study (NCT01864603), a community-cluster randomized HIV prevention trial in Uganda and Kenya. Contacts were solicited using a five question name generator in the domains of emotional support, food sharing, free time, health issues and money issues. The fully constructed networks include 170; 028 nodes and 362; 965 edges aggregated across communities (ranging from 4449 to 6829 nodes and from 2349 to 31,779 edges per community). Our algorithm matched on average 30% of named contacts in Kenyan communities and 50% of named contacts in Ugandan communities to residents named in census enumeration. Assortative mixing measures for eight different covariates reveal that residents in the network have a very strong tendency to associate with others who are similar to them in age, sex, and especially village. The networks in the SEARCH Study will provide a platform for improved understanding of health outcomes in rural East Africa. The network construction algorithm we present may facilitate future social network research in resource-limited settings.
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis
Authors:
Sean McGrath,
XiaoFei Zhao,
Russell Steele,
Brett D. Thombs,
Andrea Benedetti,
the DEPRESsion Screening Data,
Collaboration
Abstract:
Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median…
▽ More
Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.