-
ASASSN-24hd; a dwarf nova bridging WZ Sge-type and SU UMa-type superoutbursts
Authors:
Yusuke Tampo,
Naoto Kojiguchi,
Taichi Kato,
Mariko Kimura,
David. A. H. Buckley,
Berto Monard,
Franz-Josef Hambsch,
Katsuki Muraoka,
Daisaku Nogami,
Stephen B. Potter,
Anke Van Dyk,
Patrick Woudt
Abstract:
WZ Sge-type dwarf novae (DNe) form a subclass in cataclysmic variables, characterized by short-period variations called superhumps during an outburst. Here we present optical ground-based and TESS observations of ASASSN-24hd in its 2024-2025 outburst. ASASSN-24hd is the first reported WZ Sge-type DN outburst fully covered by TESS, providing a great opportunity to study the evolution of superhumps.…
▽ More
WZ Sge-type dwarf novae (DNe) form a subclass in cataclysmic variables, characterized by short-period variations called superhumps during an outburst. Here we present optical ground-based and TESS observations of ASASSN-24hd in its 2024-2025 outburst. ASASSN-24hd is the first reported WZ Sge-type DN outburst fully covered by TESS, providing a great opportunity to study the evolution of superhumps. Our observations establish its early and stage-A ordinary superhumps as 0.05711(4) and 0.05919(5) d, respectively, resulting in its mass ratio of 0.098(4). The TESS observations confirm that the evolution of its superhump period, amplitude, and profile after the appearance of ordinary superhumps is generally consistent with those of SU UMa-type DNe observed with Kepler and TESS. Furthermore, we find that ASASSN-24hd in outburst shares a great similarity to the 2010 superoutburst of an SU UMa-type DN V585 Lyr, observed by Kepler, particularly including the superhump evolution and the long waiting time ($\gtrsimeq$ 5 d) before the stage A--B transition of ordinary superhumps. The shorter superoutburst cycles and smaller outburst amplitude in V585 Lyr than those of ASASSN-24hd disfavor the interpretation that V585 Lyr is, in fact, a face-on WZ Sge-type DN where early superhumps are undetectable. Instead, one possibility of their critical differences is either low quiescence viscosity or inner disk truncation, which has been invoked to explain the extreme nature of WZ Sge-type DNe, but future observations in quiescence are vital to conclude. These findings emphasize the borderline between SU UMa-type and WZ Sge-type DNe.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
StratLearn-z: Improved photo-$z$ estimation from spectroscopic data subject to selection effects
Authors:
Chiara Moretti,
Maximilian Autenrieth,
Riccardo Serra,
Roberto Trotta,
David A. van Dyk,
Andrei Mesinger
Abstract:
A precise measurement of photometric redshifts (photo-z) is key for the success of modern photometric galaxy surveys. Machine learning (ML) methods show great promise in this context, but suffer from covariate shift (CS) in training sets due to selection bias where interesting sources are underrepresented, and the corresponding ML models show poor generalisation properties. We present an applicati…
▽ More
A precise measurement of photometric redshifts (photo-z) is key for the success of modern photometric galaxy surveys. Machine learning (ML) methods show great promise in this context, but suffer from covariate shift (CS) in training sets due to selection bias where interesting sources are underrepresented, and the corresponding ML models show poor generalisation properties. We present an application of the StratLearn method to the estimation of photo-z, validating against simulations where we enforce the presence of CS to different degrees. StratLearn is a statistically principled approach that relies on splitting the source and target datasets into strata based on estimated propensity scores (i.e. the probability for an object to be in the source set given its observed covariates). After stratification, two conditional density estimators are fit separately to each stratum, then combined via a weighted average. We benchmark our results against the GPz algorithm, quantifying the performance of the two codes with a set of metrics. Our results show that the StratLearn-z metrics are only marginally affected by the presence of CS, while GPz shows a significant degradation of performance in the photo-z prediction for fainter objects. For the strongest CS scenario, StratLearn-z yields a reduced fraction of catastrophic errors, a factor of 2 improvement for the RMSE and one order of magnitude improvement on the bias. We also assess the quality of the conditional redshift estimates with the probability integral transform (PIT). The PIT distribution obtained from StratLearn-z features fat fewer outliers and is symmetric, i.e. the predictions appear to be centered around the true redshift value, despite showing a conservative estimation of the spread of the conditional redshift distributions. Our julia implementation of the method is available at https://github.com/chiaramoretti/StratLearn-z.
△ Less
Submitted 30 April, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
A Broadband X-ray Investigation of Fast-Spinning Intermediate Polar CTCV J2056-3014
Authors:
Ciro Salcedo,
Kaya Mori,
Gabriel Bridges,
Charles J. Hailey,
David A. H. Buckley,
Raimundo Lopes de Oliveira,
Gavin Ramsay,
Anke van Dyk
Abstract:
We report on XMM-Newton, NuSTAR, and NICER X-ray observations of CTCV J2056-3014, a cataclysmic variable (CV) with one of the fastest-spinning white dwarfs (WDs) at P = 29.6 s. While previously classified as an intermediate polar (IP), CJ2056 also exhibits the properties of WZ-Sge-type CVs, such as dwarf novae and superoutbursts. With XMM-Newton and NICER, we detected the spin period up to approxi…
▽ More
We report on XMM-Newton, NuSTAR, and NICER X-ray observations of CTCV J2056-3014, a cataclysmic variable (CV) with one of the fastest-spinning white dwarfs (WDs) at P = 29.6 s. While previously classified as an intermediate polar (IP), CJ2056 also exhibits the properties of WZ-Sge-type CVs, such as dwarf novae and superoutbursts. With XMM-Newton and NICER, we detected the spin period up to approximately 2 keV with 7-$σ$ significance. We constrained its derivative to |$\dot{P}$| < 1.8e-12 s/s after correcting for binary orbital motion. The pulsed profile is characterized by a single broad peak with approximately 25% modulation. NuSTAR detected a four-fold increase in unabsorbed X-ray flux coincident with an optical flare in November 2022. The XMM-Newton and NICER X-ray spectra in 0.3-10 keV are best characterized by an absorbed optically-thin three-temperature thermal plasma model (kT = 0.3, 1.0, and 4.9 keV), while the NuSTAR spectra in 3-30 keV are best fit by a single-temperature thermal plasma model (kT = 8.4 keV), both with Fe abundance $Z_{Fe}/Z_\odot$ = 0.3. CJ2056 exhibits similarities to other fast-spinning CVs, such as low plasma temperatures, and no significant X-ray absorption at low energies. As the WD's magnetic field strength is unknown, we applied both non-magnetic and magnetic CV spectral models (MKCFLOW and MCVSPEC) to determine the WD mass. The derived WD mass range (M = 0.7-1.0 $M_\odot$) is above the centrifugal break-up mass limit of 0.56 $M_\odot$ and consistent with the mean WD mass of local CVs (M $\approx$ 0.8-0.9 $M_\odot$).
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Six Maxims of Statistical Acumen for Astronomical Data Analysis
Authors:
Hyungsuk Tak,
Yang Chen,
Vinay L. Kashyap,
Kaisey S. Mandel,
Xiao-Li Meng,
Aneta Siemiginowska,
David A. van Dyk
Abstract:
The production of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine lea…
▽ More
The production of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine learning methods and the reasoning behind them is essential for improving data-analytic proficiency and acumen. Aiming to facilitate such improvement in astronomy, we delineate cautionary tales in statistics via six maxims, with examples drawn from the astronomical literature. Inspired by the significant quality improvement in business and manufacturing processes by the routine adoption of Six Sigma, we hope the routine reflection on these Six Maxims will improve the quality of both data analysis and scientific findings in astronomy.
△ Less
Submitted 4 October, 2024; v1 submitted 28 August, 2024;
originally announced August 2024.
-
Separating States in Astronomical Sources Using Hidden Markov Models: With a Case Study of Flaring and Quiescence on EV Lac
Authors:
Robert Zimmerman,
David A. van Dyk,
Vinay L. Kashyap,
Aneta Siemiginowska
Abstract:
We present a new method to distinguish between different states (e.g., high and low, quiescent and flaring) in astronomical sources with count data. The method models the underlying physical process as latent variables following a continuous-space Markov chain that determines the expected Poisson counts in observed light curves in multiple passbands. For the underlying state process, we consider s…
▽ More
We present a new method to distinguish between different states (e.g., high and low, quiescent and flaring) in astronomical sources with count data. The method models the underlying physical process as latent variables following a continuous-space Markov chain that determines the expected Poisson counts in observed light curves in multiple passbands. For the underlying state process, we consider several autoregressive processes, yielding continuous-space hidden Markov models of varying complexity. Under these models, we can infer the state that the object is in at any given time. The continuous state predictions from these models are then dichotomized with the help of a finite mixture model to produce state classifications. We apply these techniques to X-ray data from the active dMe flare star EV Lac, splitting the data into quiescent and flaring states. We find that a first-order vector autoregressive process efficiently separates flaring from quiescence: flaring occurs over 30-40% of the observation durations, a well-defined persistent quiescent state can be identified, and the flaring state is characterized by higher plasma temperatures and emission measures.
△ Less
Submitted 3 September, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Effect of Systematic Uncertainties on Density and Temperature Estimates in Coronae of Capella
Authors:
Xixi Yu,
Vinay L. Kashyap,
Giulio Del Zanna,
David A. van Dyk,
David C. Stenning,
Connor P. Ballance,
Harry P. Warren
Abstract:
We estimate the coronal density of Capella using the O VII and Fe XVII line systems in the soft X-ray regime that have been observed over the course of the Chandra mission. Our analysis combines measures of error due to uncertainty in the underlying atomic data with statistical errors in the Chandra data to derive meaningful overall uncertainties on the plasma density of the coronae of Capella. We…
▽ More
We estimate the coronal density of Capella using the O VII and Fe XVII line systems in the soft X-ray regime that have been observed over the course of the Chandra mission. Our analysis combines measures of error due to uncertainty in the underlying atomic data with statistical errors in the Chandra data to derive meaningful overall uncertainties on the plasma density of the coronae of Capella. We consider two Bayesian frameworks. First, the so-called pragmatic-Bayesian approach considers the atomic data and their uncertainties as fully specified and uncorrectable. The fully-Bayesian approach, on the other hand, allows the observed spectral data to update the atomic data and their uncertainties, thereby reducing the overall errors on the inferred parameters. To incorporate atomic data uncertainties, we obtain a set of atomic data replicates, the distribution of which captures their uncertainty. A principal component analysis of these replicates allows us to represent the atomic uncertainty with a lower-dimensional multivariate Gaussian distribution. A $t$-distribution approximation of the uncertainties of a subset of plasma parameters including a priori temperature information, obtained from the temperature-sensitive-only Fe XVII spectral line analysis, is carried forward into the density- and temperature-sensitive O VII spectral line analysis. Markov Chain Monte Carlo based model fitting is implemented including Multi-step Monte Carlo Gibbs Sampler and Hamiltonian Monte Carlo. Our analysis recovers an isothermally approximated coronal plasma temperature of $\approx$5 MK and a coronal plasma density of $\approx$10$^{10}$ cm$^{-3}$, with uncertainties of 0.1 and 0.2 dex respectively.
△ Less
Submitted 18 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Joint Deconvolution of Astronomical Images in the Presence of Poisson Noise
Authors:
Axel Donath,
Aneta Siemiginowska,
Vinay L. Kashyap,
David A. van Dyk,
Douglas Burke
Abstract:
We present a new method for joint likelihood deconvolution (Jolideco) of a set of astronomical observations of the same sky region in the presence of Poisson noise. The observations may be obtained from different instruments with different resolution, and different point spread functions. Jolideco reconstructs a single flux image by optimizing the posterior distribution based on the joint Poisson…
▽ More
We present a new method for joint likelihood deconvolution (Jolideco) of a set of astronomical observations of the same sky region in the presence of Poisson noise. The observations may be obtained from different instruments with different resolution, and different point spread functions. Jolideco reconstructs a single flux image by optimizing the posterior distribution based on the joint Poisson likelihood of all observations under a patch-based image prior. The patch prior is parameterised via a Gaussian Mixture model which we train on high-signal-to-noise astronomical images, including data from the James Webb Telescope and the GLEAM radio survey. This prior favors correlation structures among the reconstructed pixel intensities that are characteristic of those observed in the training images. It is, however, not informative for the mean or scale of the reconstruction. By applying the method to simulated data we show that the combination of multiple observations and the patch-based prior leads to much improved reconstruction quality in many different source scenarios and signal to noise regimes. We demonstrate that with the patch prior Jolideco yields superior reconstruction quality relative to alternative standard methods such as the Richardson-Lucy method. We illustrate the results of Jolideco applied to example data from the Chandra X-ray Observatory and the Fermi-LAT Gamma-ray Space Telescope. By comparing the measured width of a counts based and the corresponding Jolideco flux profile of an X-ray filament in SNR 1E 0102.2-721} we find the deconvolved width of 0.58+- 0.02 arcsec to be consistent with the theoretical expectation derived from the known width of the PSF.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Improved Weak Lensing Photometric Redshift Calibration via StratLearn and Hierarchical Modeling
Authors:
Maximilian Autenrieth,
Angus H. Wright,
Roberto Trotta,
David A. van Dyk,
David C. Stenning,
Benjamin Joachimi
Abstract:
Discrepancies between cosmological parameter estimates from cosmic shear surveys and from recent Planck cosmic microwave background measurements challenge the ability of the highly successful $Λ$CDM model to describe the nature of the Universe. To rule out systematic biases in cosmic shear survey analyses, accurate redshift calibration within tomographic bins is key. In this paper, we improve phot…
▽ More
Discrepancies between cosmological parameter estimates from cosmic shear surveys and from recent Planck cosmic microwave background measurements challenge the ability of the highly successful $Λ$CDM model to describe the nature of the Universe. To rule out systematic biases in cosmic shear survey analyses, accurate redshift calibration within tomographic bins is key. In this paper, we improve photo-$z$ calibration via Bayesian hierarchical modeling of full galaxy photo-$z$ conditional densities, by employing $\textit{StratLearn}$, a recently developed statistical methodology, which accounts for systematic differences in the distribution of the spectroscopic training/source set and the photometric target set. Using realistic simulations that were designed to resemble the KiDS+VIKING-450 dataset, we show that $\textit{StratLearn}$-estimated conditional densities improve the galaxy tomographic bin assignment, and that our $\textit{StratLearn}$-Bayesian framework leads to nearly unbiased estimates of the target population means. This leads to a factor of $\sim 2$ improvement upon the previously best photo-$z$ calibration method. Our approach delivers a maximum bias per tomographic bin of $Δ\langle z \rangle = 0.0095 \pm 0.0089$, with an average absolute bias of $0.0052 \pm 0.0067$ across the five tomographic bins.
△ Less
Submitted 12 March, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
Rapid Evolution of the White Dwarf Pulsar AR Scorpii
Authors:
Peter Garnavich,
Stephen B. Potter,
David A. H. Buckley,
Anke van Dyk,
Daniel Egbo,
Colin Littlefield,
Anousha Greiveldinger
Abstract:
Analysis of AR Sco optical light curves spanning nine years show a secular change in the relative amplitudes of the beat pulse pairs generated by the two magnetic poles of its rotating white dwarf. Recent photometry now shows that the primary and secondary beat pulses have similar amplitudes, while in 2015 the primary pulse was approximately twice that of the secondary peak. The equalization in th…
▽ More
Analysis of AR Sco optical light curves spanning nine years show a secular change in the relative amplitudes of the beat pulse pairs generated by the two magnetic poles of its rotating white dwarf. Recent photometry now shows that the primary and secondary beat pulses have similar amplitudes, while in 2015 the primary pulse was approximately twice that of the secondary peak. The equalization in the beat pulse amplitudes is also seen in the linearly polarized flux. This rapid evolution is consistent with precession of the white dwarf spin axis. The observations imply that the pulse amplitudes cycle over a period of $\gtrsim 40$ yrs, but that the upper limit is currently poorly constrained. If precession is the mechanism driving the evolution, then over the next 10 years the ratio of the beat pulse amplitudes will reach a maximum followed by a return to asymmetric beat pulses.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Identifying diffuse spatial structures in high-energy photon lists
Authors:
Minjie Fan,
Jue Wang,
Vinay L. Kashyap,
Thomas C. M. Lee,
David A. van Dyk,
Andreas Zezas
Abstract:
Data from high-energy observations are usually obtained as lists of photon events. A common analysis task for such data is to identify whether diffuse emission exists, and to estimate its surface brightness, even in the presence of point sources that may be superposed. We have developed a novel non-parametric event list segmentation algorithm to divide up the field of view into distinct emission c…
▽ More
Data from high-energy observations are usually obtained as lists of photon events. A common analysis task for such data is to identify whether diffuse emission exists, and to estimate its surface brightness, even in the presence of point sources that may be superposed. We have developed a novel non-parametric event list segmentation algorithm to divide up the field of view into distinct emission components. We use photon location data directly, without binning them into an image. We first construct a graph from the Voronoi tessellation of the observed photon locations and then grow segments using a new adaptation of seeded region growing, that we call Seeded Region Growing on Graph, after which the overall method is named SRGonG. Starting with a set of seed locations, this results in an over-segmented dataset, which SRGonG then coalesces using a greedy algorithm where adjacent segments are merged to minimize a model comparison statistic; we use the Bayesian Information Criterion. Using SRGonG we are able to identify point-like and diffuse extended sources in the data with equal facility. We validate SRGonG using simulations, demonstrating that it is capable of discerning irregularly shaped low surface-brightness emission structures as well as point-like sources with strengths comparable to that seen in typical X-ray data. We demonstrate SRGonG's use on the Chandra data of the Antennae galaxies, and show that it segments the complex structures appropriately.
△ Less
Submitted 4 November, 2022; v1 submitted 15 August, 2022;
originally announced August 2022.
-
TD-CARMA: Painless, accurate, and scalable estimates of gravitational-lens time delays with flexible CARMA processes
Authors:
Antoine D. Meyer,
David A. van Dyk,
Hyungsuk Tak,
Aneta Siemiginowska
Abstract:
Cosmological parameters encoding our understanding of the expansion history of the Universe can be constrained by the accurate estimation of time delays arising in gravitationally lensed systems. We propose TD-CARMA, a Bayesian method to estimate cosmological time delays by modelling the observed and irregularly sampled light curves as realizations of a Continuous Auto-Regressive Moving Average (C…
▽ More
Cosmological parameters encoding our understanding of the expansion history of the Universe can be constrained by the accurate estimation of time delays arising in gravitationally lensed systems. We propose TD-CARMA, a Bayesian method to estimate cosmological time delays by modelling the observed and irregularly sampled light curves as realizations of a Continuous Auto-Regressive Moving Average (CARMA) process. Our model accounts for heteroskedastic measurement errors and microlensing, an additional source of independent extrinsic long-term variability in the source brightness. The semi-separable structure of the CARMA covariance matrix allows for fast and scalable likelihood computation using Gaussian Process modeling. We obtain a sample from the joint posterior distribution of the model parameters using a nested sampling approach. This allows for ``painless'' Bayesian Computation, dealing with the expected multi-modality of the posterior distribution in a straightforward manner and not requiring the specification of starting values or an initial guess for the time delay, unlike existing methods. In addition, the proposed sampling procedure automatically evaluates the Bayesian evidence, allowing us to perform principled Bayesian model selection. TD-CARMA is parsimonious, and typically includes no more than a dozen unknown parameters. We apply TD-CARMA to six doubly lensed quasars HS 2209+1914, SDSS J1001+5027, SDSS J1206+4332, SDSS J1515+1511, SDSS J1455+1447, SDSS J1349+1227, estimating their time delays as $-21.96 \pm 1.448$, $120.93 \pm 1.015$, $111.51 \pm 1.452$, $210.80 \pm 2.18$, $45.36 \pm 1.93$ and $432.05 \pm 1.950$ respectively. These estimates are consistent with those derived in the relevant literature, but are typically two to four times more precise.
△ Less
Submitted 9 June, 2023; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Concordance: In-flight Calibration of X-ray Telescopes without Absolute References
Authors:
Herman L. Marshall,
Yang Chen,
Jeremy J. Drake,
Matteo Guainazzi,
Vinay L. Kashyap,
Xiao-Li Meng,
Paul P. Plucinsky,
Peter Ratzlaff,
David A. van Dyk,
Xufei Wang
Abstract:
We describe a process for cross-calibrating the effective areas of X-ray telescopes that observe common targets. The targets are not assumed to be "standard candles" in the classic sense, in that we assume that the source fluxes have well-defined, but {\it a priori} unknown values. Using a technique developed by Chen et al. (2019, arXiv:1711.09429) that involves a statistical method called {\em sh…
▽ More
We describe a process for cross-calibrating the effective areas of X-ray telescopes that observe common targets. The targets are not assumed to be "standard candles" in the classic sense, in that we assume that the source fluxes have well-defined, but {\it a priori} unknown values. Using a technique developed by Chen et al. (2019, arXiv:1711.09429) that involves a statistical method called {\em shrinkage estimation}, we determine effective area correction factors for each instrument that brings estimated fluxes into the best agreement, consistent with prior knowledge of their effective areas. We expand the technique to allow unique priors on systematic uncertainties in effective areas for each X-ray astronomy instrument and to allow correlations between effective areas in different energy bands. We demonstrate the method with several data sets from various X-ray telescopes.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
New Constraints on Anisotropic Expansion from Supernovae Type Ia
Authors:
W. Rahman,
R. Trotta,
S. S. Boruah,
M. J. Hudson,
D. A. van Dyk
Abstract:
We re-examine the contentious question of constraints on anisotropic expansion from Type Ia supernovae (SNIa) in the light of a novel determination of peculiar velocities, which are crucial to test isotropy with supernovae out to distances $\lesssim 200/h$ Mpc. We re-analyze the Joint Light-Curve Analysis (JLA) Supernovae (SNe) data, improving on previous treatments of peculiar velocity correction…
▽ More
We re-examine the contentious question of constraints on anisotropic expansion from Type Ia supernovae (SNIa) in the light of a novel determination of peculiar velocities, which are crucial to test isotropy with supernovae out to distances $\lesssim 200/h$ Mpc. We re-analyze the Joint Light-Curve Analysis (JLA) Supernovae (SNe) data, improving on previous treatments of peculiar velocity corrections and their uncertainties (both statistical and systematic) by adopting state-of-the-art flow models constrained independently via the 2M$++$ galaxy redshift compilation. We also introduce a novel procedure to account for colour-based selection effects, and adjust the redshift of low-$z$ SNe self-consistently in the light of our improved peculiar velocity model.
We adopt the Bayesian hierarchical model \texttt{BAHAMAS} to constrain a dipole in the distance modulus in the context of the $Λ$CDM model and the deceleration parameter in a phenomenological Cosmographic expansion. We do not find any evidence for anisotropic expansion, and place a tight upper bound on the amplitude of a dipole, $|D_μ| < 5.93 \times 10^{-4}$ (95\% credible interval) in a $Λ$CDM setting, and $|D_{q_0}| < 6.29 \times 10^{-2}$ in the Cosmographic expansion approach. Using Bayesian model comparison, we obtain posterior odds in excess of 900:1 (640:1) against a constant-in-redshift dipole for $Λ$CDM (the Cosmographic expansion). In the isotropic case, an accelerating universe is favoured with odds of $\sim 1100:1$ with respect to a decelerating one.
△ Less
Submitted 28 April, 2022; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift
Authors:
Maximilian Autenrieth,
David A. van Dyk,
Roberto Trotta,
David C. Stenning
Abstract:
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achie…
▽ More
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners within strata constructed by partitioning the data based on the estimated propensity scores, leading to approximately balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on two contemporary research questions in cosmology, outperforming state-of-the-art importance weighting methods. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge", and we improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.
△ Less
Submitted 17 May, 2023; v1 submitted 21 June, 2021;
originally announced June 2021.
-
eBASCS: Disentangling Overlapping Astronomical Sources II, using Spatial, Spectral, and Temporal Information
Authors:
Antoine D. Meyer,
David A. van Dyk,
Vinay L. Kashyap,
Luis F. Campos,
David E. Jones,
Aneta Siemiginowska,
Andreas Zezas
Abstract:
The analysis of individual X-ray sources that appear in a crowded field can easily be compromised by the misallocation of recorded events to their originating sources. Even with a small number of sources, that nonetheless have overlapping point spread functions, the allocation of events to sources is a complex task that is subject to uncertainty. We develop a Bayesian method designed to sift high-…
▽ More
The analysis of individual X-ray sources that appear in a crowded field can easily be compromised by the misallocation of recorded events to their originating sources. Even with a small number of sources, that nonetheless have overlapping point spread functions, the allocation of events to sources is a complex task that is subject to uncertainty. We develop a Bayesian method designed to sift high-energy photon events from multiple sources with overlapping point spread functions, leveraging the differences in their spatial, spectral, and temporal signatures. The method probabilistically assigns each event to a given source. Such a disentanglement allows more detailed spectral or temporal analysis to focus on the individual component in isolation, free of contamination from other sources or the background. We are also able to compute source parameters of interest like their locations, relative brightness, and background contamination, while accounting for the uncertainty in event assignments. Simulation studies that include event arrival time information demonstrate that the temporal component improves event disambiguation beyond using only spatial and spectral information. The proposed methods correctly allocate up to 65% more events than the corresponding algorithms that ignore event arrival time information. We apply our methods to two stellar X-ray binaries, UV Cet and HBC515 A, observed with Chandra. We demonstrate that our methods are capable of removing the contamination due to a strong flare on UV Cet B in its companion approximately 40 times weaker during that event, and that evidence for spectral variability at timescales of a few ks can be determined in HBC515 Aa and HBC515 Ab.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Identification of high-energy astrophysical point sources via hierarchical Bayesian nonparametric clustering
Authors:
Andrea Sottosanti,
Mauro Bernardi,
Alessandra R. Brazzale,
Alex Geringer-Sameth,
David C. Stenning,
Roberto Trotta,
David A. van Dyk
Abstract:
The light we receive from distant astrophysical objects carries information about their origins and the physical mechanisms that power them. The study of these signals, however, is complicated by the fact that observations are often a mixture of the light emitted by multiple localized sources situated in a spatially-varying background. A general algorithm to achieve robust and accurate source iden…
▽ More
The light we receive from distant astrophysical objects carries information about their origins and the physical mechanisms that power them. The study of these signals, however, is complicated by the fact that observations are often a mixture of the light emitted by multiple localized sources situated in a spatially-varying background. A general algorithm to achieve robust and accurate source identification in this case remains an open question in astrophysics.
This paper focuses on high-energy light (such as X-rays and gamma-rays), for which observatories can detect individual photons (quanta of light), measuring their incoming direction, arrival time, and energy. Our proposed Bayesian methodology uses both the spatial and energy information to identify point sources, that is, separate them from the spatially-varying background, to estimate their number, and to compute the posterior probabilities that each photon originated from each identified source. This is accomplished via a Dirichlet process mixture while the background is simultaneously reconstructed via a flexible Bayesian nonparametric model based on B-splines. Our proposed method is validated with a suite of simulation studies and illustrated with an application to a complex region of the sky observed by the \emph{Fermi} Gamma-ray Space Telescope.
△ Less
Submitted 26 April, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.
-
Astro2020 Science White Paper: The Next Decade of Astroinformatics and Astrostatistics
Authors:
A. Siemiginowska,
G. Eadie,
I. Czekala,
E. Feigelson,
E. B. Ford,
V. Kashyap,
M. Kuhn,
T. Loredo,
M. Ntampaka,
A. Stevens,
A. Avelino,
K. Borne,
T. Budavari,
B. Burkhart,
J. Cisewski-Kehe,
F. Civano,
I. Chilingarian,
D. A. van Dyk,
G. Fabbiano,
D. P. Finkbeiner,
D. Foreman-Mackey,
P. Freeman,
A. Fruscione,
A. A. Goodman,
M. Graham
, et al. (27 additional authors not shown)
Abstract:
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New met…
▽ More
Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) developing sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Incorporating Uncertainties in Atomic Data Into the Analysis of Solar and Stellar Observations: A Case Study in Fe XIII
Authors:
Xixi Yu,
Giulio Del Zanna,
David C. Stenning,
Jessi Cisewski-Kehe,
Vinay L. Kashyap,
Nathan Stein,
David A. van Dyk,
Harry P. Warren,
Mark A. Weber
Abstract:
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures…
▽ More
Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures yield information on the temperature structure. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagates to the uncertainties in the inferred plasma parameters. At present, however, atomic physics databases do not include uncertainties on the atomic parameters and there is no established methodology for using them even if they did. In this paper we develop simple models for the uncertainties in the collision strengths and decay rates for Fe XIII and apply them to the interpretation of density sensitive lines observed with the EUV Imagining spectrometer (EIS) on Hinode. We incorporate these uncertainties in a Bayesian framework. We consider both a pragmatic Bayesian method where the atomic physics information is unaffected by the observed data, and a fully Bayesian method where the data can be used to probe the physics. The former generally increases the uncertainty in the inferred density by about a factor of 5 compared with models that incorporate only statistical uncertainties. The latter reduces the uncertainties on the inferred densities, but identifies areas of possible systematic problems with either the atomic physics or the observed intensities.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
Bayesian Hierarchical Modelling of Initial-Final Mass Relations Across Star Clusters
Authors:
Shijing Si,
Ted von Hippel,
Elliot Robinson,
Elizabeth Jeffery,
David C. Stenning,
David A. van Dyk
Abstract:
The initial-final mass relation (IFMR) of white dwarfs (WDs) plays an important role in stellar evolution. To derive precise estimates of IFMRs and explore how they may vary among star clusters, we propose a Bayesian hierarchical model that pools photo- metric data from multiple star clusters. After performing a simulation study to show the benefits of the Bayesian hierarchical model, we apply thi…
▽ More
The initial-final mass relation (IFMR) of white dwarfs (WDs) plays an important role in stellar evolution. To derive precise estimates of IFMRs and explore how they may vary among star clusters, we propose a Bayesian hierarchical model that pools photo- metric data from multiple star clusters. After performing a simulation study to show the benefits of the Bayesian hierarchical model, we apply this model to five star clus- ters: the Hyades, M67, NGC 188, NGC 2168, and NGC 2477, leading to reasonable and consistent estimates of IFMRs for these clusters. We illustrate how a cluster-specific analysis of NGC 188 using its own photometric data can produce an unreasonable IFMR since its WDs have a narrow range of zero-age main sequence (ZAMS) masses. However, the Bayesian hierarchical model corrects the cluster-specific analysis by bor- rowing strength from other clusters, thus generating more reliable estimates of IFMR parameters. The data analysis presents the benefits of Bayesian hierarchical modelling over conventional cluster-specific methods, which motivates us to elaborate the pow- erful statistical techniques in this article.
△ Less
Submitted 17 July, 2018; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Testing One Hypothesis Multiple Times: The Multidimensional Case
Authors:
Sara Algeri,
David A. van Dyk
Abstract:
The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are among the most challenging problems in statistical practice. These challenges can be tackled using a test of hypothesis where a nuisance parameter is present only under the alternative, and a computationally efficient solution can be obtained by the "Testing One H…
▽ More
The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are among the most challenging problems in statistical practice. These challenges can be tackled using a test of hypothesis where a nuisance parameter is present only under the alternative, and a computationally efficient solution can be obtained by the "Testing One Hypothesis Multiple times" (TOHM) method. In the one-dimensional setting, a fine discretization of the space of the non-identifiable parameter is specified, and a global p-value is obtained by approximating the distribution of the supremum of the resulting stochastic process. In this paper, we propose a computationally efficient inferential tool to perform TOHM in the multidimensional setting. Here, the approximations of interest typically involve the expected Euler Characteristics (EC) of the excursion set of the underlying random field. We introduce a simple algorithm to compute the EC in multiple dimensions and for arbitrary large significance levels. This leads to an highly generalizable computational tool to perform inference under non-standard regularity conditions.
△ Less
Submitted 23 June, 2019; v1 submitted 10 March, 2018;
originally announced March 2018.
-
Multidimensional Data Driven Classification of Emission-line Galaxies
Authors:
Vasileios Stampoulis,
David A. van Dyk,
Vinay L. Kashyap,
Andreas Zezas
Abstract:
We propose a new soft clustering scheme for classifying galaxies in different activity classes using simultaneously 4 emission-line ratios; log([NII ]/Ha), log([SII]/Ha), log([OI]/Ha) and log([OIII]/Hb). We fit 20 multivariate Gaussian distributions to the 4-dimensional distribution of these lines obtained from the Sloan Digital Sky Survey (SDSS) in order to capture local structures and subsequent…
▽ More
We propose a new soft clustering scheme for classifying galaxies in different activity classes using simultaneously 4 emission-line ratios; log([NII ]/Ha), log([SII]/Ha), log([OI]/Ha) and log([OIII]/Hb). We fit 20 multivariate Gaussian distributions to the 4-dimensional distribution of these lines obtained from the Sloan Digital Sky Survey (SDSS) in order to capture local structures and subsequently group the multivariate Gaussian distributions to represent the complex multi-dimensional structure of the joint distribution of galaxy spectra in the 4 dimensional line ratio space. The main advantages of this method are the use of all four optical-line ratios simultaneously and the adoption of a clustering scheme. This maximises the available information, avoids contradicting classifications, and treats each class as a distribution resulting in soft classification boundaries and providing the probability for an object to belong to each class. We also introduce linear multi-dimensional decision surfaces using support vector machines based on the classification of our soft clustering scheme. This linear multi-dimensional hard clustering technique shows high classification accuracy with respect to our soft-clustering scheme.
△ Less
Submitted 7 February, 2019; v1 submitted 4 February, 2018;
originally announced February 2018.
-
STACCATO: A Novel Solution to Supernova Photometric Classification with Biased Training Sets
Authors:
Esben A. Revsbech,
Roberto Trotta,
David A. van Dyk
Abstract:
We present a new solution to the problem of classifying Type Ia supernovae from their light curves alone given a spectroscopically confirmed but biased training set, circumventing the need to obtain an observationally expensive unbiased training set. We use Gaussian processes (GPs) to model the supernovae's (SN) light curves, and demonstrate that the choice of covariance function has only a small…
▽ More
We present a new solution to the problem of classifying Type Ia supernovae from their light curves alone given a spectroscopically confirmed but biased training set, circumventing the need to obtain an observationally expensive unbiased training set. We use Gaussian processes (GPs) to model the supernovae's (SN) light curves, and demonstrate that the choice of covariance function has only a small influence on the GPs ability to accurately classify SNe. We extend and improve the approach of Richards et al (2012} -- a diffusion map combined with a random forest classifier -- to deal specifically with the case of biassed training sets. We propose a novel method, called STACCATO (SynThetically Augmented Light Curve ClassificATiOn') that synthetically augments a biased training set by generating additional training data from the fitted GPs. Key to the success of the method is the partitioning of the observations into subgroups based on their propensity score of being included in the training set. Using simulated light curve data, we show that STACCATO increases performance, as measured by the area under the Receiver Operating Characteristic curve (AUC), from 0.93 to 0.96, close to the AUC of 0.977 obtained using the 'gold standard' of an unbiased training set and significantly improving on the previous best result of 0.88. STACCATO also increases the true positive rate for SNIa classification by up to a factor of 50 for high-redshift/low brightness SNe.
△ Less
Submitted 2 April, 2020; v1 submitted 12 June, 2017;
originally announced June 2017.
-
A Hierarchical Model for the Ages of Galactic Halo White Dwarfs
Authors:
Shijing Si,
David A. van Dyk,
Ted von Hippel,
Elliot Robinson,
Aaron Webster,
David Stenning
Abstract:
In astrophysics, we often aim to estimate one or more parameters for each member object in a population and study the distribution of the fitted parameters across the population. In this paper, we develop novel methods that allow us to take advantage of existing software designed for such case-by-case analyses to simultaneously fit parameters of both the individual objects and the parameters that…
▽ More
In astrophysics, we often aim to estimate one or more parameters for each member object in a population and study the distribution of the fitted parameters across the population. In this paper, we develop novel methods that allow us to take advantage of existing software designed for such case-by-case analyses to simultaneously fit parameters of both the individual objects and the parameters that quantify their distribution across the population. Our methods are based on Bayesian hierarchical modelling which is known to produce parameter estimators for the individual objects that are on average closer to their true values than estimators based on case-by-case analyses. We verify this in the context of estimating ages of Galactic halo white dwarfs (WDs) via a series of simulation studies. Finally, we deploy our new techniques on optical and near-infrared photometry of ten candidate halo WDs to obtain estimates of their ages along with an estimate of the mean age of Galactic halo WDs of [11.25, 12.96] Gyr. Although this sample is small, our technique lays the ground work for large-scale studies using data from the Gaia mission.
△ Less
Submitted 18 June, 2018; v1 submitted 27 March, 2017;
originally announced March 2017.
-
The ACS Survey of Galactic Globular Clusters XIV: Bayesian Single-Population Analysis of 69 Globular Clusters
Authors:
R. Wagner-Kaiser,
A. Sarajedini,
T. von Hippel,
D. C. Stenning,
D. A. van Dyk,
E. Jeffery,
E. Robinson,
N. Stein,
J. Anderson,
W. H. Jefferys
Abstract:
We use Hubble Space Telescope (HST) imaging from the ACS Treasury Survey to determine fits for single population isochrones of 69 Galactic globular clusters. Using robust Bayesian analysis techniques, we simultaneously determine ages, distances, absorptions, and helium values for each cluster under the scenario of a "single" stellar population on model grids with solar ratio heavy element abundanc…
▽ More
We use Hubble Space Telescope (HST) imaging from the ACS Treasury Survey to determine fits for single population isochrones of 69 Galactic globular clusters. Using robust Bayesian analysis techniques, we simultaneously determine ages, distances, absorptions, and helium values for each cluster under the scenario of a "single" stellar population on model grids with solar ratio heavy element abundances. The set of cluster parameters is determined in a consistent and reproducible manner for all clusters using the Bayesian analysis suite BASE-9. Our results are used to re-visit the age-metallicity relation. We find correlations with helium and several other parameters such as metallicity, binary fraction, and proxies for cluster mass. The helium abundances of the clusters are also considered in the context of CNO abundances and the multiple population scenario.
△ Less
Submitted 28 February, 2017;
originally announced February 2017.
-
Projected distances to host galaxy reduce SNIa dispersion
Authors:
R. Hill,
H. Shariff,
R. Trotta,
S. Ali-Khan,
X. Jiao,
Y. Liu,
S. K. Moon,
W. Parker,
M. Paulus,
D. A. van Dyk,
L. B. Lucy
Abstract:
We use multi-band imagery data from the Sloan Digital Sky Survey (SDSS) to measure projected distances of 302 supernova type Ia (SNIa) from the centre of their host galaxies, normalized to the galaxy's brightness scale length, with a Bayesian approach. We test the hypothesis that SNIas further away from the centre of their host galaxy are less subject to dust contamination (as the dust column dens…
▽ More
We use multi-band imagery data from the Sloan Digital Sky Survey (SDSS) to measure projected distances of 302 supernova type Ia (SNIa) from the centre of their host galaxies, normalized to the galaxy's brightness scale length, with a Bayesian approach. We test the hypothesis that SNIas further away from the centre of their host galaxy are less subject to dust contamination (as the dust column density in their environment is smaller) and/or come from a more homogeneous environment. Using the Mann-Whitney U test, we find a statistically significant difference in the observed colour correction distribution between SNIas that are near and those that are far from the centre of their host. The local p-value is 3 x 10^{-3}, which is significant at the 5 per cent level after look-elsewhere effect correction. We estimate the residual scatter of the two subgroups to be 0.073 +/- 0.018 for the far SNIas, compared to 0.114 +/- 0.009 for the near SNIas -- an improvement of 30 per cent, albeit with a low statistical significance of 2sigma. This confirms the importance of host galaxy properties in correctly interpreting SNIa observations for cosmological inference.
△ Less
Submitted 7 September, 2018; v1 submitted 13 December, 2016;
originally announced December 2016.
-
A Bayesian Analysis of the Ages of Four Open Clusters
Authors:
Elizabeth J. Jeffery,
Ted von Hippel,
David A. van Dyk,
David C. Stenning,
Elliot Robinson,
Nathan Stein,
W. H. Jefferys
Abstract:
In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each clust…
▽ More
In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional "by-eye" isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02 +/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to higher precision than that offered by these traditional methods of isochrone fitting.
△ Less
Submitted 2 November, 2016;
originally announced November 2016.
-
Detecting Relativistic X-ray Jets in High-Redshift Quasars
Authors:
Kathryn McKeough,
Aneta Siemiginowska,
C. C. Cheung,
Lukasz Stawarz,
Vinay L. Kashyap,
Nathan Stein,
Vasileios Stampoulis,
David A. van Dyk,
J. F. C. Wardle,
N. P. Lee,
D. E. Harris,
D. A. Schwartz,
Davide Donato,
Laura Maraschi,
Fabrizio Tavecchio
Abstract:
We analyze Chandra X-ray images of a sample of 11 quasars that are known to contain kiloparsec scale radio jets. The sample consists of five high-redshift (z >= 3.6) flat-spectrum radio quasars, and six intermediate redshift (2.1 < z < 2.9) quasars. The dataset includes four sources with integrated steep radio spectra and seven with flat radio spectra. A total of 25 radio jet features are present…
▽ More
We analyze Chandra X-ray images of a sample of 11 quasars that are known to contain kiloparsec scale radio jets. The sample consists of five high-redshift (z >= 3.6) flat-spectrum radio quasars, and six intermediate redshift (2.1 < z < 2.9) quasars. The dataset includes four sources with integrated steep radio spectra and seven with flat radio spectra. A total of 25 radio jet features are present in this sample. We apply a Bayesian multi-scale image reconstruction method to detect and measure the X-ray emission from the jets. We compute deviations from a baseline model that does not include the jet, and compare observed X-ray images with those computed with simulated images where no jet features exist. This allows us to compute p-value upper bounds on the significance that an X- ray jet is detected in a pre-determined region of interest. We detected 12 of the features unambiguously, and an additional 6 marginally. We also find residual emission in the cores of 3 quasars and in the background of 1 quasar that suggest the existence of unresolved X-ray jets. The dependence of the X-ray to radio luminosity ratio on redshift is a potential diagnostic of the emission mechanism, since the inverse Compton scattering of cosmic microwave background photons (IC/CMB) is thought to be redshift dependent, whereas in synchrotron models no clear redshift dependence is expected. We find that the high-redshift jets have X-ray to radio flux ratios that are marginally inconsistent with those from lower redshifts, suggesting that either the X-ray emissions is due to the IC/CMB rather than the synchrotron process, or that high redshift jets are qualitatively different.
△ Less
Submitted 12 September, 2016;
originally announced September 2016.
-
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters
Authors:
R. Wagner-Kaiser,
D. C. Stenning,
A. Sarajedini,
T. von Hippel,
D. A. van Dyk,
E. Robinson,
N. Stein,
W. H. Jefferys
Abstract:
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within e…
▽ More
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on inconsistencies between the theoretical models and the observed data.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
Standardizing Type Ia supernovae using Near Infrared rebrightening time
Authors:
Hikmatali Shariff,
Suhail Dhawan,
Xiyun Jiao,
Bruno Leibundgut,
Roberto Trotta,
David A. van Dyk
Abstract:
Accurate standardisation of Type Ia supernovae (SNIa) is instrumental to the usage of SNIa as distance indicators. We analyse a homogeneous sample of 22 low-z SNIa, observed by the Carnegie Supernova Project (CSP) in the optical and near infra-red (NIR). We study the time of the second peak in the NIR band due to re-brightening, t2, as an alternative standardisation parameter of SNIa peak brightne…
▽ More
Accurate standardisation of Type Ia supernovae (SNIa) is instrumental to the usage of SNIa as distance indicators. We analyse a homogeneous sample of 22 low-z SNIa, observed by the Carnegie Supernova Project (CSP) in the optical and near infra-red (NIR). We study the time of the second peak in the NIR band due to re-brightening, t2, as an alternative standardisation parameter of SNIa peak brightness. We use BAHAMAS, a Bayesian hierarchical model for SNIa cosmology, to determine the residual scatter in the Hubble diagram. We find that in the absence of a colour correction, t2 is a better standardisation parameter compared to stretch: t2 has a 1 sigma posterior interval for the Hubble residual scatter of [0.250, 0.257] , compared to [0.280, 0.287] when stretch (x1) alone is used. We demonstrate that when employed together with a colour correction, t2 and stretch lead to similar residual scatter. Using colour, stretch and t2 jointly as standardisation parameters does not result in any further reduction in scatter, suggesting that t2 carries redundant information with respect to stretch and colour. With a much larger SNIa NIR sample at higher redshift in the future, t2 could be a useful quantity to perform robustness checks of the standardisation procedure.
△ Less
Submitted 25 May, 2016;
originally announced May 2016.
-
The Power of Principled Bayesian Methods in the Study of Stellar Evolution
Authors:
Ted von Hippel,
David A. van Dyk,
David C. Stenning,
Elliot Robinson,
Elizabeth Jeffery,
Nathan Stein,
William H. Jefferys,
Erin O'Malley
Abstract:
It takes years of effort employing the best telescopes and instruments to obtain high-quality stellar photometry, astrometry, and spectroscopy. Stellar evolution models contain the experience of lifetimes of theoretical calculations and testing. Yet most astronomers fit these valuable models to these precious datasets by eye. We show that a principled Bayesian approach to fitting models to stellar…
▽ More
It takes years of effort employing the best telescopes and instruments to obtain high-quality stellar photometry, astrometry, and spectroscopy. Stellar evolution models contain the experience of lifetimes of theoretical calculations and testing. Yet most astronomers fit these valuable models to these precious datasets by eye. We show that a principled Bayesian approach to fitting models to stellar data yields substantially more information over a range of stellar astrophysics. We highlight advances in determining the ages of star clusters, mass ratios of binary stars, limitations in the accuracy of stellar models, post-main-sequence mass loss, and the ages of individual white dwarfs. We also outline a number of unsolved problems that would benefit from principled Bayesian analyses.
△ Less
Submitted 9 May, 2016;
originally announced May 2016.
-
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters II: NGC 5024, NGC 5272, and NGC 6352
Authors:
R. Wagner-Kaiser,
D. C. Stenning,
E. Robinson,
T. von Hippel,
A. Sarajedini,
D. A. van Dyk,
N. Stein,
W. H. Jefferys
Abstract:
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to…
▽ More
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\sim$0.05 to 0.11 for these three clusters. Model grids with solar $α$-element abundances ([$α$/Fe] =0.0) and enhanced $α$-elements ([$α$/Fe]=0.4) are adopted.
△ Less
Submitted 20 April, 2016;
originally announced April 2016.
-
Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters I: Statistical and Computational Methods
Authors:
D. C. Stenning,
R. Wagner-Kaiser,
E. Robinson,
D. A. van Dyk,
T. von Hippel,
A. Sarajedini,
N. Stein
Abstract:
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in whi…
▽ More
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).
△ Less
Submitted 21 April, 2016; v1 submitted 20 April, 2016;
originally announced April 2016.
-
On methods for correcting for the look-elsewhere effect in searches for new physics
Authors:
Sara Algeri,
David A. van Dyk,
Jan Conrad,
Brandon Anderson
Abstract:
The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lea…
▽ More
The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lead to excessively stringent acceptance criteria. In the recent physics literature, two promising statistical tools have been proposed to overcome these limitations. In 2005, a method to "find needles in haystacks" was introduced by Pilla et al. [1], and a second method was later proposed by Gross and Vitells [2] in the context of the "look elsewhere effect" and trial factors. We show that, for relatively small sample sizes, the former leads to an artificial inflation of statistical power that stems from an increase in the false detection rate, whereas the two methods exhibit similar performance for large sample sizes. We apply the methods to realistic simulations of the Fermi Large Area Telescope data, in particular the search for dark matter annihilation lines. Further, we discuss the counter-intutive scenario where the look-elsewhere corrections are more conservative than much more computationally efficient corrections for multiple hypothesis testing. Finally, we provide general guidelines for navigating the tradeoffs between statistical and computational efficiency when selecting a statistical procedure for signal detection.
△ Less
Submitted 15 December, 2016; v1 submitted 11 February, 2016;
originally announced February 2016.
-
Bayesian Estimates of Astronomical Time Delays between Gravitationally Lensed Stochastic Light Curves
Authors:
Hyungsuk Tak,
Kaisey Mandel,
David A. van Dyk,
Vinay L. Kashyap,
Xiao-Li Meng,
Aneta Siemiginowska
Abstract:
The gravitational field of a galaxy can act as a lens and deflect the light emitted by a more distant object such as a quasar. Strong gravitational lensing causes multiple images of the same quasar to appear in the sky. Since the light in each gravitationally lensed image traverses a different path length from the quasar to the Earth, fluctuations in the source brightness are observed in the sever…
▽ More
The gravitational field of a galaxy can act as a lens and deflect the light emitted by a more distant object such as a quasar. Strong gravitational lensing causes multiple images of the same quasar to appear in the sky. Since the light in each gravitationally lensed image traverses a different path length from the quasar to the Earth, fluctuations in the source brightness are observed in the several images at different times. The time delay between these fluctuations can be used to constrain cosmological parameters and can be inferred from the time series of brightness data or light curves of each image. To estimate the time delay, we construct a model based on a state-space representation for irregularly observed time series generated by a latent continuous-time Ornstein-Uhlenbeck process. We account for microlensing, an additional source of independent long-term extrinsic variability, via a polynomial regression. Our Bayesian strategy adopts a Metropolis-Hastings within Gibbs sampler. We improve the sampler by using an ancillarity-sufficiency interweaving strategy and adaptive Markov chain Monte Carlo. We introduce a profile likelihood of the time delay as an approximation of its marginal posterior distribution. The Bayesian and profile likelihood approaches complement each other, producing almost identical results; the Bayesian method is more principled but the profile likelihood is simpler to implement. We demonstrate our estimation strategy using simulated data of doubly- and quadruply-lensed quasars, and observed data from quasars Q0957+561 and J1029+2623.
△ Less
Submitted 30 January, 2017; v1 submitted 2 February, 2016;
originally announced February 2016.
-
Preprocessing Solar Images while Preserving their Latent Structure
Authors:
Nathan M Stein,
David A van Dyk,
Vinay L Kashyap
Abstract:
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and…
▽ More
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of $\sim$10 segments rather than on each of $\sim$10$^7$ pixels, reducing computing time by a factor of $\sim$10$^6$. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
BAHAMAS: new SNIa analysis reveals inconsistencies with standard cosmology
Authors:
H. Shariff,
X. Jiao,
R. Trotta,
D. A. van Dyk
Abstract:
We present results obtained by applying our BAyesian HierArchical Modeling for the Analysis of Supernova cosmology (BAHAMAS) software package to the 740 spectroscopically confirmed supernovae type Ia (SNIa) from the "Joint Light-curve Analysis" (JLA) dataset. We simultaneously determine cosmological parameters and standardization parameters, including host galaxy mass corrections, residual scatter…
▽ More
We present results obtained by applying our BAyesian HierArchical Modeling for the Analysis of Supernova cosmology (BAHAMAS) software package to the 740 spectroscopically confirmed supernovae type Ia (SNIa) from the "Joint Light-curve Analysis" (JLA) dataset. We simultaneously determine cosmological parameters and standardization parameters, including host galaxy mass corrections, residual scatter and object-by-object intrinsic magnitudes. Combining JLA and Planck Cosmic Microwave Background data, we find significant discrepancies in cosmological parameter constraints with respect to the standard analysis: we find Omega_M = 0.399+/-0.027, 2.8σ higher than previously reported and w = -0.910+/-0.045, 1.6σ higher than the standard analysis. We determine the residual scatter to be sigma_res = 0.104+/-0.005.
We confirm (at the 95% probability level) the existence of two sub-populations segregated by host galaxy mass, separated at log_{10}(M/M_solar) = 10, differing in mean intrinsic magnitude by 0.055+/-0.022 mag, lower than previously reported. Cosmological parameter constraints are however unaffected by inclusion of host galaxy mass corrections. We find ~4σ evidence for a sharp drop in the value of the color correction parameter, beta(z), at a redshift z_trans = 0.662+/-0.055. We rule out some possible explanations for this behaviour, which remains unexplained.
△ Less
Submitted 18 April, 2016; v1 submitted 20 October, 2015;
originally announced October 2015.
-
Detecting Unspecified Structure in Low-Count Images
Authors:
Nathan M. Stein,
David A. van Dyk,
Vinay L. Kashyap,
Aneta Siemiginowska
Abstract:
Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the ima…
▽ More
Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image. To infer image structure, we conduct a Bayesian analysis of a full model that uses a multiscale component to allow flexible departures from the posited null model. As a test statistic, we use a tail probability of the posterior distribution under the full model. This choice of test statistic allows us to estimate a computationally efficient upper bound on a p-value that enables us to draw strong conclusions even when there are limited computational resources that can be devoted to simulations under the null model. We demonstrate the statistical performance of our method on simulated images. Applying our method to an X-ray image of the quasar 0730+257, we find significant evidence against the null model of a single point source and uniform background, lending support to the claim of an X-ray jet.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
A method for comparing non-nested models with application to astrophysical searches for new physics
Authors:
Sara Algeri,
Jan Conrad,
David A. van Dyk
Abstract:
Searches for unknown physics and decisions between competing astrophysical models to explain data both rely on statistical hypothesis testing. The usual approach in searches for new physical phenomena is based on the statistical Likelihood Ratio Test (LRT) and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e., wh…
▽ More
Searches for unknown physics and decisions between competing astrophysical models to explain data both rely on statistical hypothesis testing. The usual approach in searches for new physical phenomena is based on the statistical Likelihood Ratio Test (LRT) and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e., when the hypotheses are non-nested, this test is not applicable. In astrophysics, this problem occurs when two models that reside in different parameter spaces are to be compared. An important example is the recently reported excess emission in astrophysical $γ$-rays and the question whether its origin is known astrophysics or dark matter. We develop and study a new, simple, generally applicable, frequentist method and validate its statistical properties using a suite of simulations studies. We exemplify it on realistic simulated data of the Fermi-LAT $γ$-ray satellite, where non-nested hypotheses testing appears in the search for particle dark matter.
△ Less
Submitted 19 February, 2016; v1 submitted 3 September, 2015;
originally announced September 2015.
-
Detecting Abrupt Changes in the Spectra of High-Energy Astrophysical Sources
Authors:
Raymond K. W. Wong,
Vinay L. Kashyap,
Thomas C. M. Lee,
David A. van Dyk
Abstract:
Variable-intensity astronomical sources are the result of complex and often extreme physical processes. Abrupt changes in source intensity are typically accompanied by equally sudden spectral shifts, i.e., sudden changes in the wavelength distribution of the emission. This article develops a method for modeling photon counts collected from observation of such sources. We embed change points into a…
▽ More
Variable-intensity astronomical sources are the result of complex and often extreme physical processes. Abrupt changes in source intensity are typically accompanied by equally sudden spectral shifts, i.e., sudden changes in the wavelength distribution of the emission. This article develops a method for modeling photon counts collected from observation of such sources. We embed change points into a marked Poisson process, where photon wavelengths are regarded as marks and both the Poisson intensity parameter and the distribution of the marks are allowed to change. To the best of our knowledge this is the first effort to embed change points into a marked Poisson process. Between the change points, the spectrum is modeled non-parametrically using a mixture of a smooth radial basis expansion and a number of local deviations from the smooth term representing spectral emission lines. Because the model is over parameterized we employ an $\ell_1$ penalty. The tuning parameter in the penalty and the number of change points are determined via the minimum description length principle. Our method is validated via a series of simulation studies and its practical utility is illustrated in the analysis of the ultra-fast rotating yellow giant star known as FK Com.
△ Less
Submitted 10 December, 2015; v1 submitted 27 August, 2015;
originally announced August 2015.
-
Disentangling Overlapping Astronomical Sources using Spatial and Spectral Information
Authors:
David E. Jones,
Vinay L. Kashyap,
David A. van Dyk
Abstract:
We present a powerful new algorithm that combines both spatial information (event locations and the point spread function) and spectral information (photon energies) to separate photons from overlapping sources. We use Bayesian statistical methods to simultaneously infer the number of overlapping sources, to probabilistically separate the photons among the sources, and to fit the parameters descri…
▽ More
We present a powerful new algorithm that combines both spatial information (event locations and the point spread function) and spectral information (photon energies) to separate photons from overlapping sources. We use Bayesian statistical methods to simultaneously infer the number of overlapping sources, to probabilistically separate the photons among the sources, and to fit the parameters describing the individual sources. Using the Bayesian joint posterior distribution, we are able to coherently quantify the uncertainties associated with all these parameters. The advantages of combining spatial and spectral information are demonstrated through a simulation study. The utility of the approach is then illustrated by analysis of observations of FK Aqr and FL Aqr with the XMM-Newton Observatory and the central region of the Orion Nebula Cluster with the Chandra X-ray Observatory.
△ Less
Submitted 21 May, 2015; v1 submitted 26 November, 2014;
originally announced November 2014.
-
A Bayesian Analysis of the Correlations Among Sunspot Cycles
Authors:
Yaming Yu,
David A. van Dyk,
Vinay L. Kashyap,
C. Alex Young
Abstract:
Sunspot numbers form a comprehensive, long-duration proxy of solar activity and have been used numerous times to empirically investigate the properties of the solar cycle. A number of correlations have been discovered over the 24 cycles for which observational records are available. Here we carry out a sophisticated statistical analysis of the sunspot record that reaffirms these correlations, and…
▽ More
Sunspot numbers form a comprehensive, long-duration proxy of solar activity and have been used numerous times to empirically investigate the properties of the solar cycle. A number of correlations have been discovered over the 24 cycles for which observational records are available. Here we carry out a sophisticated statistical analysis of the sunspot record that reaffirms these correlations, and sets up an empirical predictive framework for future cycles. An advantage of our approach is that it allows for rigorous assessment of both the statistical significance of various cycle features and the uncertainty associated with predictions. We summarize the data into three sequential relations that estimate the amplitude, duration, and time of rise to maximum for any cycle, given the values from the previous cycle. We find that there is no indication of a persistence in predictive power beyond one cycle, and conclude that the dynamo does not retain memory beyond one cycle. Based on sunspot records up to October 2011, we obtain, for Cycle 24, an estimated maximum smoothed monthly sunspot number of 97 +- 15, to occur in January--February 2014 +- 6 months.
△ Less
Submitted 8 August, 2012;
originally announced August 2012.
-
Accounting for Calibration Uncertainties in X-ray Analysis: Effective Areas in Spectral Fitting
Authors:
Hyunsook Lee,
Vinay L. Kashyap,
David A. van Dyk,
Alanna Connors,
Jeremy J. Drake,
Rima Izem,
Xiao-Li Meng,
Shandong Min,
Taeyoung Park,
Pete Ratzlaff,
Aneta Siemiginowska,
Andreas Zezas
Abstract:
While considerable advance has been made to account for statistical uncertainties in astronomical analyses, systematic instrumental uncertainties have been generally ignored. This can be crucial to a proper interpretation of analysis results because instrumental calibration uncertainty is a form of systematic uncertainty. Ignoring it can underestimate error bars and introduce bias into the fitted…
▽ More
While considerable advance has been made to account for statistical uncertainties in astronomical analyses, systematic instrumental uncertainties have been generally ignored. This can be crucial to a proper interpretation of analysis results because instrumental calibration uncertainty is a form of systematic uncertainty. Ignoring it can underestimate error bars and introduce bias into the fitted values of model parameters. Accounting for such uncertainties currently requires extensive case-specific simulations if using existing analysis packages. Here we present general statistical methods that incorporate calibration uncertainties into spectral analysis of high-energy data. We first present a method based on multiple imputation that can be applied with any fitting method, but is necessarily approximate. We then describe a more exact Bayesian approach that works in conjunction with a Markov chain Monte Carlo based fitting. We explore methods for improving computational efficiency, and in particular detail a method of summarizing calibration uncertainties with a principal component analysis of samples of plausible calibration files. This method is implemented using recently codified Chandra effective area uncertainties for low-resolution spectral analysis and is verified using both simulated and actual Chandra data. Our procedure for incorporating effective area uncertainty is easily generalized to other types of calibration uncertainties.
△ Less
Submitted 22 February, 2011;
originally announced February 2011.
-
On Computing Upper Limits to Source Intensities
Authors:
Vinay L. Kashyap,
David A. van Dyk,
Alanna Connors,
Peter Freeman,
Aneta Siemiginowska,
Jin Xu,
Andreas Zezas
Abstract:
A common problem in astrophysics is determining how bright a source could be and still not be detected. Despite the simplicity with which the problem can be stated, the solution involves complex statistical issues that require careful analysis. In contrast to the confidence bound, this concept has never been formally analyzed, leading to a great variety of often ad hoc solutions. Here we formulate…
▽ More
A common problem in astrophysics is determining how bright a source could be and still not be detected. Despite the simplicity with which the problem can be stated, the solution involves complex statistical issues that require careful analysis. In contrast to the confidence bound, this concept has never been formally analyzed, leading to a great variety of often ad hoc solutions. Here we formulate and describe the problem in a self-consistent manner. Detection significance is usually defined by the acceptable proportion of false positives (the TypeI error), and we invoke the complementary concept of false negatives (the TypeII error), based on the statistical power of a test, to compute an upper limit to the detectable source intensity. To determine the minimum intensity that a source must have for it to be detected, we first define a detection threshold, and then compute the probabilities of detecting sources of various intensities at the given threshold. The intensity that corresponds to the specified TypeII error probability defines that minimum intensity, and is identified as the upper limit. Thus, an upper limit is a characteristic of the detection procedure rather than the strength of any particular source and should not be confused with confidence intervals or other estimates of source intensity. This is particularly important given the large number of catalogs that are being generated from increasingly sensitive surveys. We discuss the differences between these upper limits and confidence bounds. Both measures are useful quantities that should be reported in order to extract the most science from catalogs, though they answer different statistical questions: an upper bound describes an inference range on the source intensity, while an upper limit calibrates the detection process. We provide a recipe for computing upper limits that applies to all detection algorithms.
△ Less
Submitted 22 June, 2010;
originally announced June 2010.
-
Searching for Narrow Emission Lines in X-ray Spectra: Computation and Methods
Authors:
Taeyoung Park,
David A. van Dyk,
Aneta Siemiginowska
Abstract:
The detection and quantification of narrow emission lines in X-ray spectra is a challenging statistical task. The Poisson nature of the photon counts leads to local random fluctuations in the observed spectrum that often results in excess emission in a narrow band of energy resembling a weak narrow line. From a formal statistical perspective, this leads to a (sometimes highly) multimodal likelih…
▽ More
The detection and quantification of narrow emission lines in X-ray spectra is a challenging statistical task. The Poisson nature of the photon counts leads to local random fluctuations in the observed spectrum that often results in excess emission in a narrow band of energy resembling a weak narrow line. From a formal statistical perspective, this leads to a (sometimes highly) multimodal likelihood. Many standard statistical procedures are based on (asymptotic) Gaussian approximations to the likelihood and simply cannot be used in such settings. Bayesian methods offer a more direct paradigm for accounting for such complicated likelihood functions but even here multimodal likelihoods pose significant computational challenges. The new Markov chain Monte Carlo (MCMC) methods developed in 2008 by van Dyk and Park, however, are able to fully explore the complex posterior distribution of the location of a narrow line, and thus provide valid statistical inference. Even with these computational tools, standard statistical quantities such as means and standard deviations cannot adequately summarize inference and standard testing procedures cannot be used to test for emission lines. In this paper, we use new efficient MCMC algorithms to fit the location of narrow emission lines, we develop new statistical strategies for summarizing highly multimodal distributions and quantifying valid statistical inference, and we extend the method of posterior predictive p-values proposed by Protassov et al. (2002) to test for the presence of narrow emission lines in X-ray spectra. We illustrate and validate our methods using simulation studies and apply them to the Chandra observations of the high redshift quasar PG1634+706.
△ Less
Submitted 23 August, 2008;
originally announced August 2008.
-
Bayesian Estimation of Hardness Ratios: Modeling and Computations
Authors:
Taeyoung Park,
Vinay L. Kashyap,
Aneta Siemiginowska,
David A. van Dyk,
Andreas Zezas,
Craig Heinke,
Bradford J. Wargelin
Abstract:
A commonly used measure to summarize the nature of a photon spectrum is the so-called Hardness Ratio, which compares the number of counts observed in different passbands. The hardness ratio is especially useful to distinguish between and categorize weak sources as a proxy for detailed spectral fitting. However, in this regime classical methods of error propagation fail, and the estimates of spec…
▽ More
A commonly used measure to summarize the nature of a photon spectrum is the so-called Hardness Ratio, which compares the number of counts observed in different passbands. The hardness ratio is especially useful to distinguish between and categorize weak sources as a proxy for detailed spectral fitting. However, in this regime classical methods of error propagation fail, and the estimates of spectral hardness become unreliable. Here we develop a rigorous statistical treatment of hardness ratios that properly deals with detected photons as independent Poisson random variables and correctly deals with the non-Gaussian nature of the error propagation. The method is Bayesian in nature, and thus can be generalized to carry out a multitude of source-population--based analyses. We verify our method with simulation studies, and compare it with the classical method. We apply this method to real world examples, such as the identification of candidate quiescent Low-mass X-ray binaries in globular clusters, and tracking the time evolution of a flare on a low-mass star.
△ Less
Submitted 10 June, 2006;
originally announced June 2006.
-
Statistics: Handle with Care, Detecting Multiple Model Components with the Likelihood Ratio Test
Authors:
Rostislav Protassov,
David A. van Dyk,
Alanna Connors,
Vinay L. Kashyap,
Aneta Siemiginowska
Abstract:
The likelihood ratio test (LRT) and the related $F$ test, do not (even asymptotically) adhere to their nominal $χ^2$ and $F$ distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and non-detections into doubt. Although there are many legitimate uses of these statistics, in some important cases it can be impossible to compute the c…
▽ More
The likelihood ratio test (LRT) and the related $F$ test, do not (even asymptotically) adhere to their nominal $χ^2$ and $F$ distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and non-detections into doubt. Although there are many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the $F$ test for detecting a line in a spectral model or a source above background despite the lack of certain required regularity conditions. In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, {\it contrary to common practice, the nominal $χ^2$ distribution for the LRT or the $F$ distribution for the $F$ test should not be used}. In this paper, we characterize an important class of problems where the LRT and the $F$ test fail and illustrate this non-standard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability-values. We present this method in some detail, as it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of May 8, 1997 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation.
△ Less
Submitted 31 January, 2002;
originally announced January 2002.
-
Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulation
Authors:
David A. van Dyk,
Alanna Connors,
Vinay L. Kashyap,
Aneta Siemiginowska
Abstract:
Over the past 10 years Bayesian methods have rapidly grown more popular as several computationally intensive statistical algorithms have become feasible with increased computer power. In this paper, we begin with a general description of the Bayesian paradigm for statistical inference and the various state-of-the-art model fitting techniques that we employ (e.g., Gibbs sampler and Metropolis- Ha…
▽ More
Over the past 10 years Bayesian methods have rapidly grown more popular as several computationally intensive statistical algorithms have become feasible with increased computer power. In this paper, we begin with a general description of the Bayesian paradigm for statistical inference and the various state-of-the-art model fitting techniques that we employ (e.g., Gibbs sampler and Metropolis- Hastings). These algorithms are very flexible and can be used to fit models that account for the highly hierarchical structure inherent in the collection of high-quality spectra and thus can keep pace with the accelerating progress of new space telescope designs. The methods we develop, which will soon be available in the CIAO software package, explicitly model photon arrivals as a Poisson process and, thus, have no difficulty with high resolution low count X-ray and gamma-ray data. We expect these methods to be useful not only for the recently launched Chandra X-ray observatory and XMM but also new generation telescopes such as Constellation X, GLAST, etc. In the context of two examples (Quasar S5 0014+813 and Hybrid-Chromosphere Supergiant Star alpha TrA) we illustrate a new highly structured model and how Bayesian posterior sampling can be used to compute estimates, error bars, and credible intervals for the various model parameters.
△ Less
Submitted 10 August, 2000;
originally announced August 2000.