-
MaxTDA: Robust Statistical Inference for Maximal Persistence in Topological Data Analysis
Authors:
Sixtus Dakurah,
Jessi Cisewski-Kehe
Abstract:
Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce t…
▽ More
Persistent homology is an area within topological data analysis (TDA) that can uncover different dimensional holes (connected components, loops, voids, etc.) in data. The holes are characterized, in part, by how long they persist across different scales. Noisy data can result in many additional holes that are not true topological signal. Various robust TDA techniques have been proposed to reduce the number of noisy holes, however, these robust methods have a tendency to also reduce the topological signal. This work introduces Maximal TDA (MaxTDA), a statistical framework addressing a limitation in TDA wherein robust inference techniques systematically underestimate the persistence of significant homological features. MaxTDA combines kernel density estimation with level-set thresholding via rejection sampling to generate consistent estimators for the maximal persistence features that minimizes bias while maintaining robustness to noise and outliers. We establish the consistency of the sampling procedure and the stability of the maximal persistence estimator. The framework also enables statistical inference on topological features through rejection bands, constructed from quantiles that bound the estimator's deviation probability. MaxTDA is particularly valuable in applications where precise quantification of statistically significant topological features is essential for revealing underlying structural properties in complex datasets. Numerical simulations across varied datasets, including an example from exoplanet astronomy, highlight the effectiveness of MaxTDA in recovering true topological signals.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Searching for Low-Mass Exoplanets Amid Stellar Variability with a Fixed Effects Linear Model of Line-by-Line Shape Changes
Authors:
Joseph Salzer,
Jessi Cisewski-Kehe,
Eric B. Ford,
Lily L. Zhao
Abstract:
The radial velocity (RV) method, also known as Doppler spectroscopy, is a powerful technique for exoplanet discovery and characterization. In recent years, progress has been made thanks to the improvements in the quality of spectra from new extreme precision RV spectrometers. However, detecting the RV signals of Earth-like exoplanets remains challenging, as the spectroscopic signatures of low-mass…
▽ More
The radial velocity (RV) method, also known as Doppler spectroscopy, is a powerful technique for exoplanet discovery and characterization. In recent years, progress has been made thanks to the improvements in the quality of spectra from new extreme precision RV spectrometers. However, detecting the RV signals of Earth-like exoplanets remains challenging, as the spectroscopic signatures of low-mass planets can be obscured or confused with intrinsic stellar variability. Changes in the shapes of spectral lines across time can provide valuable information for disentangling stellar activity from true Doppler shifts caused by low-mass exoplanets. In this work, we present a fixed effects linear model to estimate RV signals that controls for changes in line shapes by aggregating information from hundreds of spectral lines. Our methodology incorporates a wild-bootstrap approach for modeling uncertainty and cross-validation to control for overfitting. We evaluate the model's ability to remove stellar activity using solar observations from the NEID spectrograph, as the sun's true center-of-mass motion is precisely known. Including line shape-change covariates reduces the RV root-mean-square errors by approximately 70% (from 1.919 m s$^{-1}$ to 0.575 m s$^{-1}$) relative to using only the line-by-line Doppler shifts. The magnitude of the residuals is significantly less than that from traditional CCF-based RV estimators and comparable to other state-of-the-art methods for mitigating stellar variability.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
A Subsequence Approach to Topological Data Analysis for Irregularly-Spaced Time Series
Authors:
Sixtus Dakurah,
Jessi Cisewski-Kehe
Abstract:
A time-delay embedding (TDE), grounded in the framework of Takens's Theorem, provides a mechanism to represent and analyze the inherent dynamics of time-series data. Recently, topological data analysis (TDA) methods have been applied to study this time series representation mainly through the lens of persistent homology. Current literature on the fusion of TDE and TDA are adept at analyzing unifor…
▽ More
A time-delay embedding (TDE), grounded in the framework of Takens's Theorem, provides a mechanism to represent and analyze the inherent dynamics of time-series data. Recently, topological data analysis (TDA) methods have been applied to study this time series representation mainly through the lens of persistent homology. Current literature on the fusion of TDE and TDA are adept at analyzing uniformly-spaced time series observations. This work introduces a novel {\em subsequence} embedding method for irregularly-spaced time-series data. We show that this method preserves the original state space topology while reducing spurious homological features. Theoretical stability results and convergence properties of the proposed method in the presence of noise and varying levels of irregularity in the spacing of the time series are established. Numerical studies and an application to real data illustrates the performance of the proposed method.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
A Divide-and-Conquer Approach to Persistent Homology
Authors:
Chenghui Li,
Jessi Cisewski-Kehe
Abstract:
Persistent homology is a tool of topological data analysis that has been used in a variety of settings to characterize different dimensional holes in data. However, persistent homology computations can be memory intensive with a computational complexity that does not scale well as the data size becomes large. In this work, we propose a divide-and-conquer (DaC) method to mitigate these issues. The…
▽ More
Persistent homology is a tool of topological data analysis that has been used in a variety of settings to characterize different dimensional holes in data. However, persistent homology computations can be memory intensive with a computational complexity that does not scale well as the data size becomes large. In this work, we propose a divide-and-conquer (DaC) method to mitigate these issues. The proposed algorithm efficiently finds small, medium, and large-scale holes by partitioning data into sub-regions and uses a Vietoris-Rips filtration. Furthermore, we provide theoretical results that quantify the bottleneck distance between DaC and the true persistence diagram and the recovery probability of holes in the data. We empirically verify that the rate coincides with our theoretical rate, and find that the memory and computational complexity of DaC outperforms an alternative method that relies on a clustering preprocessing step to reduce the memory and computational complexity of the persistent homology computations. Finally, we test our algorithm using spatial data of the locations of lakes in Wisconsin, where the classical persistent homology is computationally infeasible.
△ Less
Submitted 25 September, 2024;
originally announced October 2024.
-
High-energy Neutrino Source Cross-correlations with Nearest-neighbor Distributions
Authors:
Zhuoyang Zhou,
Jessi Cisewski-Kehe,
Ke Fang,
Arka Banerjee
Abstract:
The astrophysical origins of the majority of the IceCube neutrinos remain unknown. Effectively characterizing the spatial distribution of the neutrino samples and associating the events with astrophysical source catalogs can be challenging given the large atmospheric neutrino background and underlying non-Gaussian spatial features in the neutrino and source samples. In this paper, we investigate a…
▽ More
The astrophysical origins of the majority of the IceCube neutrinos remain unknown. Effectively characterizing the spatial distribution of the neutrino samples and associating the events with astrophysical source catalogs can be challenging given the large atmospheric neutrino background and underlying non-Gaussian spatial features in the neutrino and source samples. In this paper, we investigate a framework for identifying and statistically evaluating the cross-correlations between IceCube data and an astrophysical source catalog based on the $k$-nearest-neighbor cumulative distribution functions ($k$NN-CDFs). We propose a maximum likelihood estimation procedure for inferring the true proportions of astrophysical neutrinos in the point-source data. We conduct a statistical power analysis of an associated likelihood ratio test with estimations of its sensitivity and discovery potential with synthetic neutrino data samples and a WISE-2MASS galaxy sample. We apply the method to IceCube's public ten-year point-source data and find no statistically significant evidence for spatial cross-correlations with the selected galaxy sample. We discuss possible extensions to the current method and explore the method's potential to identify the cross-correlation signals in data sets with different sample sizes.
△ Less
Submitted 23 March, 2025; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Confidence regions for a persistence diagram of a single image with one or more loops
Authors:
Susan Glenn,
Jessi Cisewski-Kehe,
Jun Zhu,
William M. Bement
Abstract:
Topological data analysis (TDA) uses persistent homology to quantify loops and higher-dimensional holes in data, making it particularly relevant for examining the characteristics of images of cells in the field of cell biology. In the context of a cell injury, as time progresses, a wound in the form of a ring emerges in the cell image and then gradually vanishes. Performing statistical inference o…
▽ More
Topological data analysis (TDA) uses persistent homology to quantify loops and higher-dimensional holes in data, making it particularly relevant for examining the characteristics of images of cells in the field of cell biology. In the context of a cell injury, as time progresses, a wound in the form of a ring emerges in the cell image and then gradually vanishes. Performing statistical inference on this ring-like pattern in a single image is challenging due to the absence of repeated samples. In this paper, we develop a novel framework leveraging TDA to estimate underlying structures within individual images and quantify associated uncertainties through confidence regions. Our proposed method partitions the image into the background and the damaged cell regions. Then pixels within the affected cell region are used to establish confidence regions in the space of persistence diagrams (topological summary statistics). The method establishes estimates on the persistence diagrams which correct the bias of traditional TDA approaches. A simulation study is conducted to evaluate the coverage probabilities of the proposed confidence regions in comparison to an alternative approach is proposed in this paper. We also illustrate our methodology by a real-world example provided by cell repair.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Practical Guidance for Bayesian Inference in Astronomy
Authors:
Gwendolyn M. Eadie,
Joshua S. Speagle,
Jessi Cisewski-Kehe,
Daniel Foreman-Mackey,
Daniela Huppenkothen,
David E. Jones,
Aaron Springford,
Hyungsuk Tak
Abstract:
In the last two decades, Bayesian inference has become commonplace in astronomy. At the same time, the choice of algorithms, terminology, notation, and interpretation of Bayesian inference varies from one sub-field of astronomy to the next, which can lead to confusion to both those learning and those familiar with Bayesian statistics. Moreover, the choice varies between the astronomy and statistic…
▽ More
In the last two decades, Bayesian inference has become commonplace in astronomy. At the same time, the choice of algorithms, terminology, notation, and interpretation of Bayesian inference varies from one sub-field of astronomy to the next, which can lead to confusion to both those learning and those familiar with Bayesian statistics. Moreover, the choice varies between the astronomy and statistics literature, too. In this paper, our goal is two-fold: (1) provide a reference that consolidates and clarifies terminology and notation across disciplines, and (2) outline practical guidance for Bayesian inference in astronomy. Highlighting both the astronomy and statistics literature, we cover topics such as notation, specification of the likelihood and prior distributions, inference using the posterior distribution, and posterior predictive checking. It is not our intention to introduce the entire field of Bayesian data analysis -- rather, we present a series of useful practices for astronomers who already have an understanding of the Bayesian "nuts and bolts" and wish to increase their expertise and extend their knowledge. Moreover, as the field of astrostatistics and astroinformatics continues to grow, we hope this paper will serve as both a helpful reference and as a jumping off point for deeper dives into the statistics and astrostatistics literature.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Accounting for stellar activity signals in radial-velocity data by using Change Point Detection techniques
Authors:
U. Simola,
A. Bonfanti,
X. Dumusque,
J. Cisewski-Kehe,
S. Kaski,
J. Corander
Abstract:
Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variat…
▽ More
Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variation in RV data (overall correction method). However, since active regions evolve on the photosphere over time, correlations between the RV observations and the activity indicators will correspondingly be anisotropic. We present an approach that recognizes the RV locations where the correlations between the RV and the activity indicators significantly change in order to better account for variations in RV caused by stellar activity. The proposed approach uses a general family of statistical breakpoint methods, often referred to as change point detection (CPD) algorithms; several implementations of which are available in R and python. A thorough comparison is made between the breakpoint-based approach and the overall correction method. To ensure wide representativity, we use measurements from real stars that have different levels of stellar activity and whose spectra have different signal-to-noise ratios. When the corrections for stellar activity are applied separately to each temporal segment identified by the breakpoint method, the corresponding residuals in the RV time series are typically much smaller than those obtained by the overall correction method. Consequently, the generalized Lomb-Scargle periodogram contains a smaller number of peaks caused by active regions. The CPD algorithm is particularly effective when focusing on active stars with long time series, such as alpha Cen B.
△ Less
Submitted 31 May, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Differentiating small-scale subhalo distributions in CDM and WDM models using persistent homology
Authors:
Jessi Cisewski-Kehe,
Brittany Terese Fasy,
Wojciech Hellwing,
Mark R. Lovell,
Pawel Drozda,
Mike Wu
Abstract:
The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog…
▽ More
The spatial distribution of galaxies at sufficiently small scales will encode information about the identity of the dark matter. We develop a novel description of the halo distribution using persistent homology summaries, in which collections of points are decomposed into clusters, loops and voids. We apply these methods, together with a set of hypothesis tests, to dark matter haloes in MW-analog environment regions of the cold dark matter (CDM) and warm dark matter (WDM) Copernicus Complexio $N$-body cosmological simulations. The results of the hypothesis tests find statistically significant differences (p-values $\leq$ 0.001) between the CDM and WDM structures, and the functional summaries of persistence diagrams detect differences at scales that are distinct from the comparison spatial point process functional summaries considered (including the two-point correlation function). The differences between the models are driven most strongly at filtration scales $\sim100$~kpc, where CDM generates larger numbers of unconnected halo clusters while WDM instead generates loops. This study was conducted on dark matter haloes generally; future work will involve applying the same methods to realistic galaxy catalogues.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
A Hermite-Gaussian Based Radial Velocity Estimation Method
Authors:
Parker Holzer,
Jessi Cisewski-Kehe,
Debra Fischer,
Lily Zhao
Abstract:
As the first successful technique used to detect exoplanets orbiting distant stars, the Radial Velocity Method aims to detect a periodic Doppler shift in a star's spectrum. We introduce a new, mathematically rigorous, approach to detect such a signal that accounts for functional relationships of neighboring wavelengths, minimizes the role of wavelength interpolation, accounts for heteroskedastic n…
▽ More
As the first successful technique used to detect exoplanets orbiting distant stars, the Radial Velocity Method aims to detect a periodic Doppler shift in a star's spectrum. We introduce a new, mathematically rigorous, approach to detect such a signal that accounts for functional relationships of neighboring wavelengths, minimizes the role of wavelength interpolation, accounts for heteroskedastic noise, and easily allows for statistical inference. Using Hermite-Gaussian functions, we show that the problem of detecting a Doppler shift in the spectrum can be reduced to linear regression in many settings. A simulation study demonstrates that the proposed method is able to accurately estimate an individual spectrum's radial velocity with precision below 0.3 m/s. Furthermore, the new method outperforms the traditional Cross-Correlation Function approach by reducing the root mean squared error up to 15 cm/s. The proposed method is also demonstrated on a new set of observations from the EXtreme PREcision Spectrometer (EXPRES) for the star 51 Pegasi, and successfully recovers estimates that agree well with previous studies of this planetary system. Data and Python3 code associated with this work can be found at https://github.com/parkerholzer/hgrv_method. The method is also implemented in the open source R package rvmethod.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Trend Filtering -- II. Denoising Astronomical Signals with Varying Degrees of Smoothness
Authors:
Collin A. Politsch,
Jessi Cisewski-Kehe,
Rupert A. C. Croft,
Larry Wasserman
Abstract:
Trend filtering---first introduced into the astronomical literature in Paper I of this series---is a state-of-the-art statistical tool for denoising one-dimensional signals that possess varying degrees of smoothness. In this work, we demonstrate the broad utility of trend filtering to observational astronomy by discussing how it can contribute to a variety of spectroscopic and time-domain studies.…
▽ More
Trend filtering---first introduced into the astronomical literature in Paper I of this series---is a state-of-the-art statistical tool for denoising one-dimensional signals that possess varying degrees of smoothness. In this work, we demonstrate the broad utility of trend filtering to observational astronomy by discussing how it can contribute to a variety of spectroscopic and time-domain studies. The observations we discuss are (1) the Lyman-$α$ forest of quasar spectra; (2) more general spectroscopy of quasars, galaxies, and stars; (3) stellar light curves with planetary transits; (4) eclipsing binary light curves; and (5) supernova light curves. We study the Lyman-$α$ forest in the greatest detail---using trend filtering to map the large-scale structure of the intergalactic medium along quasar-observer lines of sight. The remaining studies share broad themes of: (1) estimating observable parameters of light curves and spectra; and (2) constructing observational spectral/light-curve templates. We also briefly discuss the utility of trend filtering as a tool for one-dimensional data reduction and compression.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Realizing the potential of astrostatistics and astroinformatics
Authors:
Gwendolyn Eadie,
Thomas J. Loredo,
Ashish A. Mahabal,
Aneta Siemiginowska,
Eric Feigelson,
Eric B. Ford,
S. G. Djorgovski,
Matthew Graham,
Zeljko Ivezic,
Kirk Borne,
Jessi Cisewski-Kehe,
J. E. G. Peek,
Chad Schafer,
Padma A. Yanamandra-Fisher,
C. Alex Young
Abstract:
This Astro2020 State of the Profession Consideration White Paper highlights the growth of astrostatistics and astroinformatics in astronomy, identifies key issues hampering the maturation of these new subfields, and makes recommendations for structural improvements at different levels that, if acted upon, will make significant positive impacts across astronomy.
This Astro2020 State of the Profession Consideration White Paper highlights the growth of astrostatistics and astroinformatics in astronomy, identifies key issues hampering the maturation of these new subfields, and makes recommendations for structural improvements at different levels that, if acted upon, will make significant positive impacts across astronomy.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Trend Filtering -- I. A Modern Statistical Tool for Time-Domain Astronomy and Astronomical Spectroscopy
Authors:
Collin A. Politsch,
Jessi Cisewski-Kehe,
Rupert A. C. Croft,
Larry Wasserman
Abstract:
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. For example, in the time domain, an astronomical object may exhibit a smoothly varying intensity that is occasionally interrupted by abrupt dips or spikes. Likewise, in the spectroscopic setting, a noiseless spectrum typically contains int…
▽ More
The problem of denoising a one-dimensional signal possessing varying degrees of smoothness is ubiquitous in time-domain astronomy and astronomical spectroscopy. For example, in the time domain, an astronomical object may exhibit a smoothly varying intensity that is occasionally interrupted by abrupt dips or spikes. Likewise, in the spectroscopic setting, a noiseless spectrum typically contains intervals of relative smoothness mixed with localized higher frequency components such as emission peaks and absorption lines. In this work, we present trend filtering, a modern nonparametric statistical tool that yields significant improvements in this broad problem space of denoising $spatially$ $heterogeneous$ signals. When the underlying signal is spatially heterogeneous, trend filtering is superior to any statistical estimator that is a linear combination of the observed data---including kernel smoothers, LOESS, smoothing splines, Gaussian process regression, and many other popular methods. Furthermore, the trend filtering estimate can be computed with practical and scalable efficiency via a specialized convex optimization algorithm, e.g. handling sample sizes of $n\gtrsim10^7$ within a few minutes. In a companion paper, we explicitly demonstrate the broad utility of trend filtering to observational astronomy by carrying out a diverse set of spectroscopic and time-domain analyses.
△ Less
Submitted 10 January, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Adaptive Approximate Bayesian Computation Tolerance Selection
Authors:
Umberto Simola,
Jessica Cisewski-Kehe,
Michael U. Gutmann,
Jukka Corander
Abstract:
Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al.…
▽ More
Approximate Bayesian Computation (ABC) methods are increasingly used for inference in situations in which the likelihood function is either computationally costly or intractable to evaluate. Extensions of the basic ABC rejection algorithm have improved the computational efficiency of the procedure and broadened its applicability. The ABC-Population Monte Carlo (ABC-PMC) approach of Beaumont et al. (2009) has become a popular choice for approximate sampling from the posterior. ABC-PMC is a sequential sampler with an iteratively decreasing value of the tolerance, which specifies how close the simulated data need to be to the real data for acceptance. We propose a method for adaptively selecting a sequence of tolerances that improves the computational efficiency of the algorithm over other common techniques. In addition we define a stopping rule as a by-product of the adaptation procedure, which assists in automating termination of sampling. The proposed automatic ABC-PMC algorithm can be easily implemented and we present several examples demonstrating its benefits in terms of computational efficiency.
△ Less
Submitted 30 April, 2020; v1 submitted 21 June, 2019;
originally announced July 2019.
-
A Preferential Attachment Model for the Stellar Initial Mass Function
Authors:
Jessi Cisewski-Kehe,
Grant Weller,
Chad Schafer
Abstract:
Accurate specification of a likelihood function is becoming increasingly difficult in many inference problems in astronomy. As sample sizes resulting from astronomical surveys continue to grow, deficiencies in the likelihood function lead to larger biases in key parameter estimates. These deficiencies result from the oversimplification of the physical processes that generated the data, and from th…
▽ More
Accurate specification of a likelihood function is becoming increasingly difficult in many inference problems in astronomy. As sample sizes resulting from astronomical surveys continue to grow, deficiencies in the likelihood function lead to larger biases in key parameter estimates. These deficiencies result from the oversimplification of the physical processes that generated the data, and from the failure to account for observational limitations. Unfortunately, realistic models often do not yield an analytical form for the likelihood. The estimation of a stellar initial mass function (IMF) is an important example. The stellar IMF is the mass distribution of stars initially formed in a given cluster of stars, a population which is not directly observable due to stellar evolution and other disruptions and observational limitations of the cluster. There are several difficulties with specifying a likelihood in this setting since the physical processes and observational challenges result in measurable masses that cannot legitimately be considered independent draws from an IMF. This work improves inference of the IMF by using an approximate Bayesian computation approach that both accounts for observational and astrophysical effects and incorporates a physically-motivated model for star cluster formation. The methodology is illustrated via a simulation study, demonstrating that the proposed approach can recover the true posterior in realistic situations, and applied to observations from astrophysical simulation data.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Measuring precise radial velocities and cross-correlation function line-profile variations using a Skew Normal density
Authors:
Umberto Simola,
Xavier Dumusque,
Jessi Cisewski-Kehe
Abstract:
Stellar activity is one of the primary limitations to the detection of low-mass exoplanets using the radial-velocity (RV) technique. We propose to estimate the variations in shape of the CCF by fitting a Skew Normal (SN) density which, unlike the commonly employed Normal density, includes a skewness parameter to capture the asymmetry of the CCF induced by stellar activity and the convective bluesh…
▽ More
Stellar activity is one of the primary limitations to the detection of low-mass exoplanets using the radial-velocity (RV) technique. We propose to estimate the variations in shape of the CCF by fitting a Skew Normal (SN) density which, unlike the commonly employed Normal density, includes a skewness parameter to capture the asymmetry of the CCF induced by stellar activity and the convective blueshift. The performances of the proposed method are compared to the commonly employed Normal density using both simulations and real observations, with different levels of activity and signal-to-noise ratio. When considering real observations, the correlation between the RV and the asymmetry of the CCF and between the RV and the width of the CCF are stronger when using the parameters estimated with the SN density rather than the ones obtained with the commonly employed Normal density. Using the proposed SN approach, the uncertainties estimated on the RV defined as the median of the SN are on average 10% smaller than the uncertainties calculated on the mean of the Normal. The uncertainties estimated on the asymmetry parameter of the SN are on average 15% smaller than the uncertainties measured on the Bisector Inverse Slope Span (BIS SPAN), which is the commonly used parameter to evaluate the asymmetry of the CCF. We also propose a new model to account for stellar activity when fitting a planetary signal to RV data. Based on simple simulations, we were able to demonstrate that this new model improves the planetary detection limits by 12% compared to the model commonly used to account for stellar activity. The SN density is a better model than the Normal density for characterizing the CCF since the correlations used to probe stellar activity are stronger and the uncertainties of the RV estimate and the asymmetry of the CCF are both smaller.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Functional Summaries of Persistence Diagrams
Authors:
Eric Berry,
Yen-Chi Chen,
Jessi Cisewski-Kehe,
Brittany Terese Fasy
Abstract:
One of the primary areas of interest in applied algebraic topology is persistent homology, and, more specifically, the persistence diagram. Persistence diagrams have also become objects of interest in topological data analysis. However, persistence diagrams do not naturally lend themselves to statistical goals, such as inferring certain population characteristics, because their complicated structu…
▽ More
One of the primary areas of interest in applied algebraic topology is persistent homology, and, more specifically, the persistence diagram. Persistence diagrams have also become objects of interest in topological data analysis. However, persistence diagrams do not naturally lend themselves to statistical goals, such as inferring certain population characteristics, because their complicated structure makes common algebraic operations--such as addition, division, and multiplication-- challenging (e.g., the mean might not be unique). To bypass these issues, several functional summaries of persistence diagrams have been proposed in the literature (e.g. landscape and silhouette functions). The problem of analyzing a set of persistence diagrams then becomes the problem of analyzing a set of functions, which is a topic that has been studied for decades in statistics. First, we review the various functional summaries in the literature and propose a unified framework for the functional summaries. Then, we generalize the definition of persistence landscape functions, establish several theoretical properties of the persistence functional summaries, and demonstrate and discuss their performance in the context of classification using simulated prostate cancer histology data, and two-sample hypothesis tests comparing human and monkey fibrin images, after developing a simulation study using a new data generator we call the Pickup Sticks Simulator (STIX).
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
Approximate Bayesian Computation for Finite Mixture Models
Authors:
Umberto Simola,
Jessi Cisewski-Kehe,
Robert L. Wolpert
Abstract:
Finite mixture models are used in statistics and other disciplines, but inference for mixture models is challenging due, in part, to the multimodality of the likelihood function and the so-called label switching problem. We propose extensions of the Approximate Bayesian Computation-Population Monte Carlo (ABC-PMC) algorithm as an alternative framework for inference on finite mixture models. There…
▽ More
Finite mixture models are used in statistics and other disciplines, but inference for mixture models is challenging due, in part, to the multimodality of the likelihood function and the so-called label switching problem. We propose extensions of the Approximate Bayesian Computation-Population Monte Carlo (ABC-PMC) algorithm as an alternative framework for inference on finite mixture models. There are several decisions to make when implementing an ABC-PMC algorithm for finite mixture models, including the selection of the kernels used for moving the particles through the iterations, how to address the label switching problem, and the choice of informative summary statistics. Examples are presented to demonstrate the performance of the proposed ABC-PMC algorithm for mixture modeling. The performance of the proposed method is evaluated in a simulation study and for the popular recessional velocity galaxy data.
△ Less
Submitted 2 November, 2020; v1 submitted 27 March, 2018;
originally announced March 2018.