-
A Principled Path to Fitted Distributional Evaluation
Authors:
Sungee Hong,
Jiayi Wang,
Zhengling Qi,
Raymond Ka Wai Wong
Abstract:
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributi…
▽ More
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributional evaluation (FDE). While only a few related approaches exist, there remains no unified framework for designing FDE methods. To fill this gap, we present a set of guiding principles for constructing theoretically grounded FDE methods. Building on these principles, we develop several new FDE methods with convergence analysis and provide theoretical justification for existing methods, even in non-tabular environments. Extensive experiments, including simulations on linear quadratic regulators and Atari games, demonstrate the superior performance of the FDE methods.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Efficient Estimation under Multiple Missing Patterns via Balancing Weights
Authors:
Jianing Dong,
Raymond K. W. Wong,
Kwun Chuen Gary Chan
Abstract:
As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and conventional methods cannot be applied. Pattern graphs have recently been proposed as a tool to systematically relate various observed patterns in the sample. We…
▽ More
As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and conventional methods cannot be applied. Pattern graphs have recently been proposed as a tool to systematically relate various observed patterns in the sample. We extend its scope to the estimation of parameters defined by moment equations, including common regression models, via solving weighted estimating equations with weights constructed using a sequential balancing approach. These novel weights are carefully crafted to address the instability issue of the straightforward approach based on local balancing. We derive the efficiency bound for the model parameters and show that our proposed method, albeit relatively simple, is asymptotically efficient. Simulation results demonstrate the superior performance of the proposed method, and real-data applications illustrate how the results are robust to the choice of identification assumptions.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis
Authors:
Thivya Thogesan,
Anupiya Nugaliyadde,
Kok Wai Wong
Abstract:
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs), particularly in high-stakes applications where it is crucial to comprehend the rationale behind forecasts. This research addressed this by introducing a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attent…
▽ More
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs), particularly in high-stakes applications where it is crucial to comprehend the rationale behind forecasts. This research addressed this by introducing a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attention layer to provide a layer-by-layer knowledge of sentiment prediction. The approach offers a clearer overview of how model interpret and categorise sentiment by breaking down LLMs into these parts. The method is evaluated using the Stanford Sentiment Treebank (SST-2) dataset, which shows how different sentences affect different layers. The effectiveness of layer-wise SHAP analysis in clarifying sentiment-specific token attributions is demonstrated by experimental evaluations, which provide a notable enhancement over current whole-model explainability techniques. These results highlight how the suggested approach could improve the reliability and transparency of LLM-based sentiment analysis in crucial applications.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Energy Diffusion and Advection Coefficients in Kinetic Simulations of Relativistic Plasma Turbulence
Authors:
Kai W. Wong,
Vladimir Zhdankin,
Dmitri A. Uzdensky,
Gregory R. Werner,
Mitchell C. Begelman
Abstract:
Turbulent, relativistic nonthermal plasmas are ubiquitous in high-energy astrophysical systems, as inferred from broadband nonthermal emission spectra. The underlying turbulent nonthermal particle acceleration (NTPA) processes have traditionally been modelled with a Fokker-Planck (FP) diffusion-advection equation for the particle energy distribution. We test FP-type NTPA theories by performing and…
▽ More
Turbulent, relativistic nonthermal plasmas are ubiquitous in high-energy astrophysical systems, as inferred from broadband nonthermal emission spectra. The underlying turbulent nonthermal particle acceleration (NTPA) processes have traditionally been modelled with a Fokker-Planck (FP) diffusion-advection equation for the particle energy distribution. We test FP-type NTPA theories by performing and analysing particle-in-cell (PIC) simulations of turbulence in collisionless relativistic pair plasma. By tracking large numbers of particles in simulations with different initial magnetisation and system size, we first test and confirm the applicability of the FP framework. We then measure the FP energy diffusion ($D$) and advection ($A$) coefficients as functions of particle energy $γm c^2$, and compare their dependence to theoretical predictions. At high energies, we robustly find $D \sim γ^2$ for all cases. Hence, we fit $D = D_0 γ^2$ and find a scaling consistent with $D_0 \sim σ^{3/2}$ at low instantaneous magnetisation $σ(t)$, flattening to $D_0 \sim σ$ at higher $σ\sim 1$. We also find that the power-law index $α(t)$ of the particle energy distribution converges exponentially in time. We build and test an analytic model connecting the FP coefficients and $α(t)$, predicting $A(γ) \sim γ\log γ$. We confirm this functional form in our measurements of $A(γ,t)$, which allows us to predict $α(t)$ through the model relations. Our results suggest that the basic second-order Fermi acceleration model, which predicts $D_0 \sim σ$, may not be a complete description of NTPA in turbulent plasmas. These findings encourage further application of tracked particles and FP coefficients as a diagnostic in kinetic simulations of various astrophysically relevant plasma processes like collisionless shocks and magnetic reconnection.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Peering into the black box: forward-modeling the uncertainty budget of high-resolution spectroscopy of exoplanet atmospheres
Authors:
Arjun B. Savel,
Megan Bedell,
Eliza M. -R. Kempton,
Peter Smith,
Jacob L. Bean,
Lily L. Zhao,
Kaze W. K. Wong,
Jorge A. Sanchez,
Michael R. Line
Abstract:
Ground-based high-resolution cross-correlation spectroscopy (HRCCS; R >~ 15,000) is a powerful complement to space-based studies of exoplanet atmospheres. By resolving individual spectral lines, HRCCS can precisely measure chemical abundance ratios, directly constrain atmospheric dynamics, and robustly probe multidimensional physics. But the subtleties of HRCCS datasets -- e.g., the lack of exopla…
▽ More
Ground-based high-resolution cross-correlation spectroscopy (HRCCS; R >~ 15,000) is a powerful complement to space-based studies of exoplanet atmospheres. By resolving individual spectral lines, HRCCS can precisely measure chemical abundance ratios, directly constrain atmospheric dynamics, and robustly probe multidimensional physics. But the subtleties of HRCCS datasets -- e.g., the lack of exoplanetary spectra visible by eye and the statistically complex process of telluric removal -- can make interpreting them difficult. In this work, we seek to clarify the uncertainty budget of HRCCS with a forward-modeling approach. We present a HRCCS observation simulator, scope (https://github.com/arjunsavel/scope), that incorporates spectral contributions from the exoplanet, star, tellurics, and instrument. This tool allows us to control the underlying dataset, enabling controlled experimentation with complex HRCCS methods. Simulating a fiducial hot Jupiter dataset (WASP-77Ab emission with IGRINS), we first confirm via multiple tests that the commonly used principal components analysis does not bias the planetary signal when few components are used. Furthermore, we demonstrate that mildly varying tellurics and moderate wavelength solution errors induce only mild decreases in HRCCS detection significance. However, limiting-case, strongly varying tellurics can bias the retrieved velocities and gas abundances. Additionally, in the low-SNR limit, constraints on gas abundances become highly non-Gaussian. Our investigation of the uncertainties and potential biases inherent in HRCCS data analysis enables greater confidence in scientific results from this maturing method.
△ Less
Submitted 6 January, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Super-Resolution without High-Resolution Labels for Black Hole Simulations
Authors:
Thomas Helfer,
Thomas D. P. Edwards,
Jessica Dafflon,
Kaze W. K. Wong,
Matthew Lyle Olson
Abstract:
Generating high-resolution simulations is key for advancing our understanding of one of the universe's most violent events: Black Hole mergers. However, generating Black Hole simulations is limited by prohibitive computational costs and scalability issues, reducing the simulation's fidelity and resolution achievable within reasonable time frames and resources. In this work, we introduce a novel me…
▽ More
Generating high-resolution simulations is key for advancing our understanding of one of the universe's most violent events: Black Hole mergers. However, generating Black Hole simulations is limited by prohibitive computational costs and scalability issues, reducing the simulation's fidelity and resolution achievable within reasonable time frames and resources. In this work, we introduce a novel method that circumvents these limitations by applying a super-resolution technique without directly needing high-resolution labels, leveraging the Hamiltonian and momentum constraints-fundamental equations in general relativity that govern the dynamics of spacetime. We demonstrate that our method achieves a reduction in constraint violation by one to two orders of magnitude and generalizes effectively to out-of-distribution simulations.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows
Authors:
Alicja Polanska,
Thibeau Wouters,
Peter T. H. Pang,
Kaze K. W. Wong,
Jason D. McEwen
Abstract:
We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evi…
▽ More
We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evidence estimates run on $1$ GPU are consistent with traditional nested sampling techniques run on $16$ CPU cores, while reducing the computation time by factors of $5\times$ and $15\times$ for $4$-dimensional and $11$-dimensional gravitational wave inference problems, respectively. Our code is available in well-tested and thoroughly documented open-source packages, ensuring accessibility and reproducibility for the wider research community.
△ Less
Submitted 31 October, 2024; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Gravitational-Wave Parameter Estimation in non-Gaussian noise using Score-Based Likelihood Characterization
Authors:
Ronan Legin,
Maximiliano Isi,
Kaze W. K. Wong,
Yashar Hezaveh,
Laurence Perreault-Levasseur
Abstract:
Gravitational-wave (GW) parameter estimation typically assumes that instrumental noise is Gaussian and stationary. Obvious departures from this idealization are typically handled on a case-by-case basis, e.g., through bespoke procedures to ``clean'' non-Gaussian noise transients (glitches), as was famously the case for the GW170817 neutron-star binary. Although effective, manipulating the data in…
▽ More
Gravitational-wave (GW) parameter estimation typically assumes that instrumental noise is Gaussian and stationary. Obvious departures from this idealization are typically handled on a case-by-case basis, e.g., through bespoke procedures to ``clean'' non-Gaussian noise transients (glitches), as was famously the case for the GW170817 neutron-star binary. Although effective, manipulating the data in this way can introduce biases in the inference of key astrophysical properties, like binary precession, and compound in unpredictable ways when combining multiple observations; alternative procedures free of the same biases, like joint inference of noise and signal properties, have so far proved too computationally expensive to execute at scale. Here we take a different approach: rather than explicitly modeling individual non-Gaussianities to then apply the traditional GW likelihood, we seek to learn the true distribution of instrumental noise without presuming Gaussianity and stationarity in the first place. Assuming only noise additivity, we employ score-based diffusion models to learn an empirical noise distribution directly from detector data and then combine it with a deterministic waveform model to provide an unbiased estimate of the likelihood function. We validate the method by performing inference on a subset of GW parameters from 400 mock observations, containing real LIGO noise from either the Livingston or Hanford detectors. We show that the proposed method can recover the true parameters even in the presence of loud glitches, and that the inference is unbiased over a population of signals without applying any cleaning to the data. This work provides a promising avenue for extracting unbiased source properties in future GW observations over the coming decade.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
Authors:
Jiayi Wang,
Zhengling Qi,
Raymond K. W. Wong
Abstract:
In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parameteric and nonparametric models on the $Q$-function. Specifically, we address three key questions related t…
▽ More
In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive theoretical understanding of FQE estimators under both parameteric and nonparametric models on the $Q$-function. Specifically, we address three key questions related to FQE that remain largely unexplored in the current literature: (1) Is the optimal convergence rate for estimating the policy value regarding the sample size $n$ ($n^{-1/2}$) achievable for FQE under a non-parametric model with a fixed horizon ($T$)? (2) How does the error bound depend on the horizon $T$? (3) What is the role of the probability ratio function in improving the convergence of FQE estimators? Specifically, we show that under the completeness assumption of $Q$-functions, which is mild in the non-parametric setting, the estimation errors for policy value using both parametric and non-parametric FQE estimators can achieve an optimal rate in terms of $n$. The corresponding error bounds in terms of both $n$ and $T$ are also established. With an additional realizability assumption on ratio functions, the rate of estimation errors can be improved from $T^{1.5}/\sqrt{n}$ to $T/\sqrt{n}$, which matches the sharpest known bound in the current literature under the tabular setting.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Balancing Weights for Non-monotone Missing Data
Authors:
Jianing Dong,
Raymond K. W. Wong,
Kwun Chuen Gary Chan
Abstract:
Balancing weights have been widely applied to single or monotone missingness due to empirical advantages over likelihood-based methods and inverse probability weighting approaches. This paper considers non-monotone missing data under the complete-case missing variable condition (CCMV), a case of missing not at random (MNAR). Using relationships between each missing pattern and the complete-case su…
▽ More
Balancing weights have been widely applied to single or monotone missingness due to empirical advantages over likelihood-based methods and inverse probability weighting approaches. This paper considers non-monotone missing data under the complete-case missing variable condition (CCMV), a case of missing not at random (MNAR). Using relationships between each missing pattern and the complete-case subsample, we construct a weighted estimator for estimation, where the weight is a sum of ratios of the conditional probability of observing a particular missing pattern versus that of observing the complete-case, given the variables observed in the corresponding missing pattern. However, plug-in estimators of the propensity odds can be unbounded and lead to unstable estimation. Using further relations between propensity odds and balancing of moments across response patterns, we employ tailored loss functions, each encouraging empirical balance across patterns to estimate propensity odds flexibly using a functional basis expansion. We propose two penalizations to control propensity odds model smoothness and empirical imbalance. We study the asymptotic properties of the proposed estimators and show that they are consistent under mild smoothness assumptions. Asymptotic normality and efficiency are developed. Simulation results show the superior performance of the proposed method.
△ Less
Submitted 12 December, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Birefringence tests of gravity with multi-messenger binaries
Authors:
Macarena Lagos,
Leah Jenks,
Maximiliano Isi,
Kenta Hotokezaka,
Brian D. Metzger,
Eric Burns,
Will M. Farr,
Scott Perkins,
Kaze W. K. Wong,
Nicolas Yunes
Abstract:
Extensions to General Relativity (GR) allow the polarization of gravitational waves (GW) from astrophysical sources to suffer from amplitude and velocity birefringence, which respectively induce changes in the ellipticity and orientation of the polarization tensor. We introduce a multi-messenger approach to test this polarization behavior of GWs during their cosmological propagation using binary s…
▽ More
Extensions to General Relativity (GR) allow the polarization of gravitational waves (GW) from astrophysical sources to suffer from amplitude and velocity birefringence, which respectively induce changes in the ellipticity and orientation of the polarization tensor. We introduce a multi-messenger approach to test this polarization behavior of GWs during their cosmological propagation using binary sources, for which the initial polarization is determined by the inclination and orientation angles of the orbital angular momentum vector with respect to the line of sight. In particular, we use spatially-resolved radio imaging of the jet from a binary neutron star (BNS) merger to constrain the orientation angle and hence the emitted polarization orientation of the GW signal at the site of the merger, and compare to that observed on Earth by GW detectors. For GW170817 we constrain the deviation from GR due to amplitude birefringence to $κ_A = -0.12^{+0.60}_{-0.61}$, while the velocity birefringence parameter $κ_V$ remains unconstrained. The inability to constrain $κ_V$ is due to the fact that Virgo did not detect GW170817, and measurements of the polarization orientation require information from a combination of multiple detectors with different alignments. For this reason, we also mock future BNS mergers with resolved afterglow proper motion and project that $κ_V$ could be constrained to a precision of $5\,$rad (corresponding to an angular shift of the GW polarization of $δφ_V\approx 0.2\,$rad for a BNS at $100\,$Mpc) by a future network of third-generation ground-based GW detectors such as Cosmic Explorer and the radio High Sensitivity Array. Crucially, this velocity birefringence effect cannot be constrained with dark binary mergers as it requires polarization information at the emission time, which can be provided only by electromagnetic emission.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Distributional Off-policy Evaluation with Bellman Residual Minimization
Authors:
Sungee Hong,
Zhengling Qi,
Raymond K. W. Wong
Abstract:
We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable ex…
▽ More
We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable expectation-extended statistical distances and provide a novel theoretical justification on their validity for learning the return distribution. Based on this attractive property, we propose a new method called Energy Bellman Residual Minimizer (EBRM) for distributional OPE. We provide corresponding in-depth theoretical analyses. We establish a finite-sample error bound for the EBRM estimator under the realizability assumption. Furthermore, we introduce a variant of our method based on a multi-step extension which improves the error bound for non-realizable settings. Notably, unlike prior distributional OPE methods, the theoretical guarantees of our method do not require the completeness assumption.
△ Less
Submitted 12 March, 2025; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Directed Cyclic Graph for Causal Discovery from Multivariate Functional Data
Authors:
Saptarshi Roy,
Raymond K. W. Wong,
Yang Ni
Abstract:
Discovering causal relationship using multivariate functional data has received a significant amount of attention very recently. In this article, we introduce a functional linear structural equation model for causal structure learning when the underlying graph involving the multivariate functions may have cycles. To enhance interpretability, our model involves a low-dimensional causal embedded spa…
▽ More
Discovering causal relationship using multivariate functional data has received a significant amount of attention very recently. In this article, we introduce a functional linear structural equation model for causal structure learning when the underlying graph involving the multivariate functions may have cycles. To enhance interpretability, our model involves a low-dimensional causal embedded space such that all the relevant causal information in the multivariate functional data is preserved in this lower-dimensional subspace. We prove that the proposed model is causally identifiable under standard assumptions that are often made in the causal discovery literature. To carry out inference of our model, we develop a fully Bayesian framework with suitable prior specifications and uncertainty quantification through posterior summaries. We illustrate the superior performance of our method over existing methods in terms of causal graph estimation through extensive simulation studies. We also demonstrate the proposed method using a brain EEG dataset.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Prediction of Tropical Pacific Rain Rates with Over-parameterized Neural Networks
Authors:
Hojun You,
Jiayi Wang,
Raymond K. W. Wong,
Courtney Schumacher,
R. Saravanan,
Mikyoung Jun
Abstract:
The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces over-parameterized neural networks not only to forecast tropical rain rates, but also to explain their heavy-tailed distribution. The prediction is separately conducted for three rain types (stratiform, deep…
▽ More
The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces over-parameterized neural networks not only to forecast tropical rain rates, but also to explain their heavy-tailed distribution. The prediction is separately conducted for three rain types (stratiform, deep convective, and shallow convective) observed by the Global Precipitation Measurement satellite radar over the West and East Pacific regions. Atmospheric profiles of humidity, temperature, and zonal and meridional winds from the MERRA-2 reanalysis are considered as features. Although over-parameterized neural networks are well-known for their ``double descent phenomenon," little has been explored about their applicability to climate data and capability of capturing the tail behavior of data. In our results, over-parameterized neural networks accurately predict the rain rate distributions and outperform other machine learning methods. Spatial maps show that over-parameterized neural networks also successfully describe spatial patterns of each rain type across the tropical Pacific. In addition, we assess the feature importance for each over-parameterized neural network to provide insight into the key factors driving the predictions, with low-level humidity and temperature variables being the overall most important. These findings highlight the capability of over-parameterized neural networks in predicting the distribution of the rain rate and explaining extreme values.
△ Less
Submitted 20 February, 2024; v1 submitted 22 September, 2023;
originally announced September 2023.
-
AspGap: Augmented Stellar Parameters and Abundances for 23 million RGB stars from Gaia XP low-resolution spectra
Authors:
Jiadong Li,
Kaze W. K. Wong,
David W. Hogg,
Hans-Walter Rix,
Vedant Chandra
Abstract:
We present AspGap, a new approach to infer stellar labels from low-resolution Gaia XP spectra, including precise [$α$/M] estimates for the first time. AspGap is a neural-network based regression model trained on APOGEE spectra. In the training step, AspGap learns to use XP spectra not only to predict stellar labels but also the high-resolution APOGEE spectra that lead to the same stellar labels. T…
▽ More
We present AspGap, a new approach to infer stellar labels from low-resolution Gaia XP spectra, including precise [$α$/M] estimates for the first time. AspGap is a neural-network based regression model trained on APOGEE spectra. In the training step, AspGap learns to use XP spectra not only to predict stellar labels but also the high-resolution APOGEE spectra that lead to the same stellar labels. The inclusion of this last model component -- dubbed the hallucinator -- creates a more physically motivated mapping and significantly improves the prediction of stellar labels in the validation, particularly of [$α$/M]. For giant stars, we find cross-validated rms accuracies for Teff, log g, [M/H], [$α$/M] of ~1%, 0.12 dex, 0.07 dex, 0.03 dex, respectively. We also validate our labels through comparison with external datasets and through a range of astrophysical tests that demonstrate that we are indeed determining [$α$/M] from the XP spectra, rather than just inferring it indirectly from correlations with other labels. We publicly release the AspGap codebase, along with our stellar parameter catalog for all giants observed by Gaia XP. AspGap enables new insights into the formation and chemo-dynamics of our Galaxy by providing precise [$α$/M] estimates for 23 million giant stars, including 12 million with radial velocities from Gaia.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Flexible Functional Treatment Effect Estimation
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Xiaoke Zhang,
Kwun Chuen Gary Chan
Abstract:
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight…
▽ More
We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weights are constructed by directly minimizing the uniform balancing error resulting from a decomposition of the WMKRR estimator, instead of being estimated under a particular treatment selection model. Despite the complex structure of the uniform balancing error derived under WMKRR, finite-dimensional convex algorithms can be applied to efficiently solve for the proposed weights thanks to a representer theorem. The optimal convergence rate is shown to be attainable by the proposed WMKRR estimator without any smoothness assumption on the true weight function. Corresponding empirical performance is demonstrated by a simulation study and a real data application.
△ Less
Submitted 12 November, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Recalibrating Gravitational Wave Phenomenological Waveform Model
Authors:
Kelvin K. H. Lam,
Kaze W. K. Wong,
Thomas D. P. Edwards
Abstract:
We investigate the possibility of improving the accuracy of the phenomenological waveform model, IMRPhenomD, by jointly optimizing all the calibration coefficients at once, given a set of numerical relativity (NR) waveforms. When IMRPhenomD was first calibrated to NR waveforms, different parts (i.e., the inspiral, merger, and ringdown) of the waveform were calibrated separately. Using ripple, a li…
▽ More
We investigate the possibility of improving the accuracy of the phenomenological waveform model, IMRPhenomD, by jointly optimizing all the calibration coefficients at once, given a set of numerical relativity (NR) waveforms. When IMRPhenomD was first calibrated to NR waveforms, different parts (i.e., the inspiral, merger, and ringdown) of the waveform were calibrated separately. Using ripple, a library of waveform models compatible with automatic differentiation, we can, for the first time, perform gradient-based optimization on all the waveform coefficients at the same time. This joint optimization process allows us to capture previously ignored correlations between separate parts of the waveform. We found that after recalibration, the median mismatch between the model and NR waveforms decreases by 50%. We further explore how different regions of the source parameter space respond to the optimization procedure. We find that the degree of improvement correlates with the spins of the source. This work shows a promising avenue to help understand and treat systematic error in waveform models.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Equivariant geometric convolutions for emulation of dynamical systems
Authors:
Wilson G. Gregory,
David W. Hogg,
Ben Blum-Smith,
Maria Teresa Arias,
Kaze W. K. Wong,
Soledad Villar
Abstract:
Machine learning methods are increasingly being employed as surrogate models in place of computationally expensive and slow numerical integrators for a bevy of applications in the natural sciences. However, while the laws of physics are relationships between scalars, vectors, and tensors that hold regardless of the frame of reference or chosen coordinate system, surrogate machine learning models a…
▽ More
Machine learning methods are increasingly being employed as surrogate models in place of computationally expensive and slow numerical integrators for a bevy of applications in the natural sciences. However, while the laws of physics are relationships between scalars, vectors, and tensors that hold regardless of the frame of reference or chosen coordinate system, surrogate machine learning models are not coordinate-free by default. We enforce coordinate freedom by using geometric convolutions in three model architectures: a ResNet, a Dilated ResNet, and a UNet. In numerical experiments emulating 2D compressible Navier-Stokes, we see better accuracy and improved stability compared to baseline surrogate models in almost all cases. The ease of enforcing coordinate freedom without making major changes to the model architecture provides an exciting recipe for any CNN-based method applied to an appropriate class of problems
△ Less
Submitted 1 November, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Constraining gravitational wave amplitude birefringence with GWTC-3
Authors:
Thomas C. K. Ng,
Maximiliano Isi,
Kaze W. K. Wong,
Will M. Farr
Abstract:
The propagation of gravitational waves can reveal fundamental features of the structure of spacetime. For instance, differences in the propagation of gravitational-wave polarizations would be a smoking gun for parity violations in the gravitational sector, as expected from birefringent theories like Chern-Simons gravity. Here we look for evidence of amplitude birefringence in the third catalog of…
▽ More
The propagation of gravitational waves can reveal fundamental features of the structure of spacetime. For instance, differences in the propagation of gravitational-wave polarizations would be a smoking gun for parity violations in the gravitational sector, as expected from birefringent theories like Chern-Simons gravity. Here we look for evidence of amplitude birefringence in the third catalog of detections by the Laser Interferometer Gravitational Wave Observatory and Virgo through the use of birefringent templates inspired by dynamical Chern-Simons gravity. From $71$ binary-black-hole signals, we obtain the most precise constraints on gravitational-wave amplitude birefringence yet, measuring a birefringent attenuation of $κ= -0.019^{+0.038}_{-0.029} \, \mathrm{Gpc}^{-1}$ at $100 \, \mathrm{Hz}$ with $90\%$ credibility, equivalent to a parity-violation energy scale of $M_{\rm PV} \gtrsim 6.8 \times 10^{-21}\, {\rm GeV}$.
△ Less
Submitted 30 October, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Bayesian Nonlinear Tensor Regression with Functional Fused Elastic Net Prior
Authors:
Shuoli Chen,
Kejun He,
Shiyuan He,
Yang Ni,
Raymond K. W. Wong
Abstract:
Tensor regression methods have been widely used to predict a scalar response from covariates in the form of a multiway array. In many applications, the regions of tensor covariates used for prediction are often spatially connected with unknown shapes and discontinuous jumps on the boundaries. Moreover, the relationship between the response and the tensor covariates can be nonlinear. In this articl…
▽ More
Tensor regression methods have been widely used to predict a scalar response from covariates in the form of a multiway array. In many applications, the regions of tensor covariates used for prediction are often spatially connected with unknown shapes and discontinuous jumps on the boundaries. Moreover, the relationship between the response and the tensor covariates can be nonlinear. In this article, we develop a nonlinear Bayesian tensor additive regression model to accommodate such spatial structure. A functional fused elastic net prior is proposed over the additive component functions to comprehensively model the nonlinearity and spatial smoothness, detect the discontinuous jumps, and simultaneously identify the active regions. The great flexibility and interpretability of the proposed method against the alternatives are demonstrated by a simulation study and an analysis on facial feature data.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Fast gravitational wave parameter estimation without compromises
Authors:
Kaze W. K. Wong,
Maximiliano Isi,
Thomas D. P. Edwards
Abstract:
We present a lightweight, flexible, and high-performance framework for inferring the properties of gravitational-wave events. By combining likelihood heterodyning, automatically-differentiable and accelerator-compatible waveforms, and gradient-based Markov chain Monte Carlo (MCMC) sampling enhanced by normalizing flows, we achieve full Bayesian parameter estimation for real events like GW150914 an…
▽ More
We present a lightweight, flexible, and high-performance framework for inferring the properties of gravitational-wave events. By combining likelihood heterodyning, automatically-differentiable and accelerator-compatible waveforms, and gradient-based Markov chain Monte Carlo (MCMC) sampling enhanced by normalizing flows, we achieve full Bayesian parameter estimation for real events like GW150914 and GW170817 within a minute of sampling time. Our framework does not require pretraining or explicit reparameterizations and can be generalized to handle higher dimensional problems. We present the details of our implementation and discuss trade-offs and future developments in the context of other proposed strategies for real-time parameter estimation. Our code for running the analysis is publicly available on GitHub https://github.com/kazewong/jim.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
ripple: Differentiable and Hardware-Accelerated Waveforms for Gravitational Wave Data Analysis
Authors:
Thomas D. P. Edwards,
Kaze W. K. Wong,
Kelvin K. H. Lam,
Adam Coogan,
Daniel Foreman-Mackey,
Maximiliano Isi,
Aaron Zimmerman
Abstract:
We propose the use of automatic differentiation through the programming framework jax for accelerating a variety of analysis tasks throughout gravitational wave (GW) science. Firstly, we demonstrate that complete waveforms which cover the inspiral, merger, and ringdown of binary black holes (i.e. IMRPhenomD) can be written in jax and demonstrate that the serial evaluation speed of the waveform (an…
▽ More
We propose the use of automatic differentiation through the programming framework jax for accelerating a variety of analysis tasks throughout gravitational wave (GW) science. Firstly, we demonstrate that complete waveforms which cover the inspiral, merger, and ringdown of binary black holes (i.e. IMRPhenomD) can be written in jax and demonstrate that the serial evaluation speed of the waveform (and its derivative) is similar to the lalsuite implementation in C. Moreover, jax allows for GPU-accelerated waveform calls which can be over an order of magnitude faster than serial evaluation on a CPU. We then focus on three applications where efficient and differentiable waveforms are essential. Firstly, we demonstrate how gradient descent can be used to optimize the $\sim 200$ coefficients that are used to calibrate the waveform model. In particular, we demonstrate that the typical match with numerical relativity waveforms can be improved by more than 50% without any additional overhead. Secondly, we show that Fisher forecasting calculations can be sped up by $\sim 100\times$ (on a CPU) with no loss in accuracy. This increased speed makes population forecasting substantially simpler. Finally, we show that gradient-based samplers like Hamiltonian Monte Carlo lead to significantly reduced autocorrelation values when compared to traditional Monte Carlo methods. Since differentiable waveforms have substantial advantages for a variety of tasks throughout GW science, we propose that waveform developers use jax to build new waveforms moving forward. Our waveform code, ripple, can be found at https://github.com/tedwards2412/ripple, and will continue to be updated with new waveforms as they are implemented.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Implicit Regularization for Group Sparsity
Authors:
Jiangyuan Li,
Thanh V. Nguyen,
Chinmay Hegde,
Raymond K. W. Wong
Abstract:
We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In…
▽ More
We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments.
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
flowMC: Normalizing-flow enhanced sampling package for probabilistic inference in Jax
Authors:
Kaze W. K. Wong,
Marylou Gabrié,
Daniel Foreman-Mackey
Abstract:
flowMC is a Python library for accelerated Markov Chain Monte Carlo (MCMC) leveraging deep generative modeling. It is built on top of the machine learning libraries JAX and Flax. At its core, flowMC uses a local sampler and a learnable global sampler in tandem to efficiently sample posterior distributions. While multiple chains of the local sampler generate samples over the region of interest in t…
▽ More
flowMC is a Python library for accelerated Markov Chain Monte Carlo (MCMC) leveraging deep generative modeling. It is built on top of the machine learning libraries JAX and Flax. At its core, flowMC uses a local sampler and a learnable global sampler in tandem to efficiently sample posterior distributions. While multiple chains of the local sampler generate samples over the region of interest in the target parameter space, the package uses these samples to train a normalizing flow model, then uses it to propose global jumps across the parameter space. The flowMC sampler can handle non-trivial geometry, such as multimodal distributions and distributions with local correlations.
The key features of flowMC are summarized in the following list: * Since flowMC is built on top of JAX, it supports gradient-based samplers through automatic differentiation such as MALA and Hamiltonian Monte Carlo (HMC). * flowMC uses state-of-the-art normalizing flow models such as Rational-Quadratic Splines to power its global sampler. These models are very efficient in capturing important features within a relatively short training time. * Use of accelerators such as GPUs and TPUs are natively supported. The code also supports the use of multiple accelerators with SIMD parallelism. * By default, Just-in-time (JIT) compilations are used to further speed up the sampling process. * We provide a simple black box interface for the users who want to use flowMC by its default parameters, yet provide at the same time an extensive guide explaining trade-offs while tuning the sampler parameters. The tight integration of all the above features makes flowMC a highly performant yet simple- to-use package for statistical inference.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Modelling Multi-relations for Convolutional-based Knowledge Graph Embedding
Authors:
Sirui Li,
Kok Wai Wong,
Dengya Zhu,
Chun Che Fung
Abstract:
Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, Co…
▽ More
Representation learning of knowledge graphs aims to embed entities and relations into low-dimensional vectors. Most existing works only consider the direct relations or paths between an entity pair. It is considered that such approaches disconnect the semantic connection of multi-relations between an entity pair, and we propose a convolutional and multi-relational representation learning model, ConvMR. The proposed ConvMR model addresses the multi-relation issue in two aspects: (1) Encoding the multi-relations between an entity pair into a unified vector that maintains the semantic connection. (2) Since not all relations are necessary while joining multi-relations, we propose an attention-based relation encoder to automatically assign weights to different relations based on semantic hierarchy. Experimental results on two popular datasets, FB15k-237 and WN18RR, achieved consistent improvements on the mean rank. We also found that ConvMR is efficient to deal with less frequent entities.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
A Sun-like star orbiting a black hole
Authors:
Kareem El-Badry,
Hans-Walter Rix,
Eliot Quataert,
Andrew W. Howard,
Howard Isaacson,
Jim Fuller,
Keith Hawkins,
Katelyn Breivik,
Kaze W. K. Wong,
Antonio C. Rodriguez,
Charlie Conroy,
Sahar Shahaf,
Tsevi Mazeh,
Frédéric Arenou,
Kevin B. Burdge,
Dolev Bashi,
Simchon Faigler,
Daniel R. Weisz,
Rhys Seeburger,
Silvia Almada Monter,
Jennifer Wojno
Abstract:
We report discovery of a bright, nearby ($G = 13.8;\,\,d = 480\,\rm pc$) Sun-like star orbiting a dark object. We identified the system as a black hole candidate via its astrometric orbital solution from the Gaia mission. Radial velocities validated and refined the Gaia solution, and spectroscopy ruled out significant light contributions from another star. Joint modeling of radial velocities and a…
▽ More
We report discovery of a bright, nearby ($G = 13.8;\,\,d = 480\,\rm pc$) Sun-like star orbiting a dark object. We identified the system as a black hole candidate via its astrometric orbital solution from the Gaia mission. Radial velocities validated and refined the Gaia solution, and spectroscopy ruled out significant light contributions from another star. Joint modeling of radial velocities and astrometry constrains the companion mass to $M_2 = 9.62\pm 0.18\,M_{\odot}$. The spectroscopic orbit alone sets a minimum companion mass of $M_2>5\,M_{\odot}$; if the companion were a $5\,M_{\odot}$ star, it would be $500$ times more luminous than the entire system. These constraints are insensitive to the mass of the luminous star, which appears as a slowly-rotating G dwarf ($T_{\rm eff}=5850\,\rm K$, $\log g = 4.5$, $M=0.93\,M_{\odot}$), with near-solar metallicity ($\rm [Fe/H] = -0.2$) and an unremarkable abundance pattern. We find no plausible astrophysical scenario that can explain the orbit and does not involve a black hole. The orbital period, $P_{\rm orb}=185.6$ days, is longer than that of any known stellar-mass black hole binary. The system's modest eccentricity ($e=0.45$), high metallicity, and thin-disk Galactic orbit suggest that it was born in the Milky Way disk with at most a weak natal kick. How the system formed is uncertain. Common envelope evolution can only produce the system's wide orbit under extreme and likely unphysical assumptions. Formation models involving triples or dynamical assembly in an open cluster may be more promising. This is the nearest known black hole by a factor of 3, and its discovery suggests the existence of a sizable population of dormant black holes in binaries. Future Gaia releases will likely facilitate the discovery of dozens more.
△ Less
Submitted 28 February, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Nonlinear effects in black hole ringdown
Authors:
Mark Ho-Yeuk Cheung,
Vishal Baibhav,
Emanuele Berti,
Vitor Cardoso,
Gregorio Carullo,
Roberto Cotesta,
Walter Del Pozzo,
Francisco Duque,
Thomas Helfer,
Estuti Shukla,
Kaze W. K. Wong
Abstract:
We report evidence for nonlinear modes in the ringdown stage of the gravitational waveform produced by the merger of two comparable-mass black holes. We consider both the coalescence of black hole binaries in quasicircular orbits and high-energy, head-on black hole collisions. The presence of nonlinear modes in the numerical simulations confirms that general-relativistic nonlinearities are importa…
▽ More
We report evidence for nonlinear modes in the ringdown stage of the gravitational waveform produced by the merger of two comparable-mass black holes. We consider both the coalescence of black hole binaries in quasicircular orbits and high-energy, head-on black hole collisions. The presence of nonlinear modes in the numerical simulations confirms that general-relativistic nonlinearities are important and must be considered in gravitational-wave data analysis.
△ Less
Submitted 26 February, 2023; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Relevance Judgment Convergence Degree -- A Measure of Inconsistency among Assessors for Information Retrieval
Authors:
Dengya Zhu,
Shastri L Nimmagadda,
Kok Wai Wong,
Torsten Reiners
Abstract:
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets are created for Information Retrieval (IR) systems. However, a small group of experts' relevance judgment results are usually taken as ground truth to "objectively" evaluate the performance of the IR systems. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the pot…
▽ More
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets are created for Information Retrieval (IR) systems. However, a small group of experts' relevance judgment results are usually taken as ground truth to "objectively" evaluate the performance of the IR systems. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert's judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. In this research, we introduce a Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed RJCD score and the performance differences between the two IR systems.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Automated discovery of interpretable gravitational-wave population models
Authors:
Kaze W. K Wong,
Miles Cranmer
Abstract:
We present an automatic approach to discover analytic population models for gravitational-wave (GW) events from data. As more gravitational-wave (GW) events are detected, flexible models such as Gaussian Mixture Models have become more important in fitting the distribution of GW properties due to their expressivity. However, flexible models come with many parameters that lack physical motivation,…
▽ More
We present an automatic approach to discover analytic population models for gravitational-wave (GW) events from data. As more gravitational-wave (GW) events are detected, flexible models such as Gaussian Mixture Models have become more important in fitting the distribution of GW properties due to their expressivity. However, flexible models come with many parameters that lack physical motivation, making interpreting the implication of these models challenging. In this work, we demonstrate symbolic regression can complement flexible models by distilling the posterior predictive distribution of such flexible models into interpretable analytic expressions. We recover common GW population models such as a power-law-plus-Gaussian, and find a new empirical population model which combines accuracy and simplicity. This demonstrates a strategy to automatically discover interpretable population models in the ever-growing GW catalog, which can potentially be applied to other astrophysical phenomena.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Hierarchical nuclear norm penalization for multi-view data
Authors:
Sangyoon Yi,
Raymond K. W. Wong,
Irina Gaynanova
Abstract:
The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifyi…
▽ More
The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifying associations across views. However, existing methods have limitations in modeling partially-shared structures due to either too restrictive models, or restrictive identifiability conditions. To address these challenges, we formulate a new model for partially-shared signals based on grouping the views into so-called hierarchical levels. The proposed hierarchy leads us to introduce a new penalty, hierarchical nuclear norm (HNN), for signal estimation. In contrast to existing methods, HNN penalization avoids scores and loadings factorization of the signals and leads to a convex optimization problem, which we solve using a dual forward-backward algorithm. We propose a simple refitting procedure to adjust the penalization bias and develop an adapted version of bi-cross-validation for selecting tuning parameters. Extensive simulation studies and analysis of the genotype-tissue expression data demonstrate the advantages of our method over existing alternatives.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Backward Population Synthesis: Mapping the Evolutionary History of Gravitational-Wave Progenitors
Authors:
Kaze W. K. Wong,
Katelyn Breivik,
Will M. Farr,
Rodrigo Luger
Abstract:
One promising way to extract information about stellar astrophysics from gravitational wave catalogs is to compare the catalog to the outputs of stellar population synthesis modeling with varying physical assumptions. The parameter space of physical assumptions in population synthesis is high-dimensional and the choice of parameters that best represents the evolution of a binary system may depend…
▽ More
One promising way to extract information about stellar astrophysics from gravitational wave catalogs is to compare the catalog to the outputs of stellar population synthesis modeling with varying physical assumptions. The parameter space of physical assumptions in population synthesis is high-dimensional and the choice of parameters that best represents the evolution of a binary system may depend in an as-yet-to-be-determined way on the system's properties. Here we propose a pipeline to simultaneously infer zero-age main sequence properties and population synthesis parameter settings controlling modeled binary evolution from individual gravitational wave observations of merging compact binaries. Our pipeline can efficiently explore the high-dimensional space of population synthesis settings and progenitor system properties for each system in a catalog of gravitational wave observations. We apply our pipeline to observations in the third third LIGO-Virgo Gravitational-Wave Transient Catalog. We showcase the effectiveness of this pipeline with a detailed study of the progenitor properties and population synthesis settings that produce mergers like the observed GW150914. Our pipeline permits a measurement of the variation of population synthesis parameter settings with binary properties, if any; we present inferences for the recent GWTC-3 transient catalog that suggest that the stable mass transfer efficiency parameter may vary with primary black hole mass.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Extending the Use of MDL for High-Dimensional Problems: Variable Selection, Robust Fitting, and Additive Modeling
Authors:
Zhenyu Wei,
Raymond K. W. Wong,
Thomas C. M. Lee
Abstract:
In the signal processing and statistics literature, the minimum description length (MDL) principle is a popular tool for choosing model complexity. Successful examples include signal denoising and variable selection in linear regression, for which the corresponding MDL solutions often enjoy consistent properties and produce very promising empirical results. This paper demonstrates that MDL can be…
▽ More
In the signal processing and statistics literature, the minimum description length (MDL) principle is a popular tool for choosing model complexity. Successful examples include signal denoising and variable selection in linear regression, for which the corresponding MDL solutions often enjoy consistent properties and produce very promising empirical results. This paper demonstrates that MDL can be extended naturally to the high-dimensional setting, where the number of predictors $p$ is larger than the number of observations $n$. It first considers the case of linear regression, then allows for outliers in the data, and lastly extends to the robust fitting of nonparametric additive models. Results from numerical experiments are presented to demonstrate the efficiency and effectiveness of the MDL approach.
△ Less
Submitted 26 January, 2022;
originally announced January 2022.
-
Inferring the Intermediate Mass Black Hole Number Density from Gravitational Wave Lensing Statistics
Authors:
Joseph Gais,
Ken Ng,
Eungwang Seo,
Kaze W. K. Wong,
Tjonnie G. F. Li
Abstract:
The population properties of intermediate mass black holes remain largely unknown, and understanding their distribution could provide a missing link in the formation of supermassive black holes and galaxies. Gravitational wave observations can help fill in the gap from stellar mass black holes to supermassive black holes. In our work, we propose a new method for probing lens populations through le…
▽ More
The population properties of intermediate mass black holes remain largely unknown, and understanding their distribution could provide a missing link in the formation of supermassive black holes and galaxies. Gravitational wave observations can help fill in the gap from stellar mass black holes to supermassive black holes. In our work, we propose a new method for probing lens populations through lensing statistics of gravitational waves, here focusing on inferring the number density of intermediate mass black holes. Using hierarchical Bayesian inference of injected lensed gravitational waves, we find that existing gravitational wave observatories at design sensitivity could either identify an injected number density of $10^6 \mathrm{Mpc}^{-3}$ or place an upper bound of $\lesssim 10^4 \mathrm{Mpc}^{-3}$ for an injected $10^3 \mathrm{Mpc}^{-3}$. More broadly, our method could be applied to probe other forms of compact matter as well.
△ Less
Submitted 11 January, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
The CAMELS project: public data release
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Anglés-Alcázar,
Lucia A. Perez,
Pablo Villanueva-Domingo,
Digvijay Wadekar,
Helen Shao,
Faizan G. Mohammad,
Sultan Hassan,
Emily Moser,
Erwin T. Lau,
Luis Fernando Machado Poletti Valle,
Andrina Nicola,
Leander Thiele,
Yongseok Jo,
Oliver H. E. Philcox,
Benjamin D. Oppenheimer,
Megan Tillman,
ChangHoon Hahn,
Neerav Kaushal,
Alice Pisani,
Matthew Gebhardt,
Ana Maria Delgado,
Joyce Caliendo,
Christina Kreisch
, et al. (22 additional authors not shown)
Abstract:
The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present…
▽ More
The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogues, power spectra, bispectra, Lyman-$α$ spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over one thousand catalogues that contain billions of galaxies from CAMELS-SAM: a large collection of N-body simulations that have been combined with the Santa Cruz Semi-Analytic Model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics. We provide further technical details on how to access, download, read, and process the data at \url{https://camels.readthedocs.io}.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Testing the robustness of simulation-based gravitational-wave population inference
Authors:
Damon H. T. Cheung,
Kaze W. K. Wong,
Otto A. Hannuksela,
Tjonnie G. F. Li,
Shirley Ho
Abstract:
Gravitational-wave population studies have become more important in gravitational-wave astronomy because of the rapid growth of the observed catalog. In recent studies, emulators based on different machine learning techniques are used to emulate the outcomes of the population synthesis simulation with fast speed. In this study, we benchmark the performance of two emulators that learn the truncated…
▽ More
Gravitational-wave population studies have become more important in gravitational-wave astronomy because of the rapid growth of the observed catalog. In recent studies, emulators based on different machine learning techniques are used to emulate the outcomes of the population synthesis simulation with fast speed. In this study, we benchmark the performance of two emulators that learn the truncated power-law phenomenological model by using Gaussian process regression and normalizing flows techniques to see which one is a more capable likelihood emulator in the population inference. We benchmark the characteristic of the emulators by comparing their performance in the population inference to the phenomenological model using mock and real observation data. Our results suggest that the normalizing flows emulator can recover the posterior distribution by using the phenomenological model in the population inference with up to 300 mock injections. The normalizing flows emulator also underestimates the uncertainty for some posterior distributions in the population inference on real observation data. On the other hand, the Gaussian process regression emulator has poor performance on the same task and can only be used effectively in low-dimension cases.
△ Less
Submitted 2 January, 2023; v1 submitted 13 December, 2021;
originally announced December 2021.
-
A New Constraint on the Nuclear Equation of State from Statistical Distributions of Compact Remnants of Supernovae
Authors:
Mikhail M. Meskhi,
Noah E. Wolfe,
Zhenyu Dai,
Carla Frohlich,
Jonah M. Miller,
Raymond K. W. Wong,
Ricardo Vilalta
Abstract:
Understanding how matter behaves at the highest densities and temperatures is a major open problem in both nuclear physics and relativistic astrophysics. This physics is often encapsulated in the so-called high-temperature nuclear equation of state, which influences compact binary mergers, core-collapse supernovae, and many more phenomena. One such case is the type (either black hole or neutron st…
▽ More
Understanding how matter behaves at the highest densities and temperatures is a major open problem in both nuclear physics and relativistic astrophysics. This physics is often encapsulated in the so-called high-temperature nuclear equation of state, which influences compact binary mergers, core-collapse supernovae, and many more phenomena. One such case is the type (either black hole or neutron star) and mass of the remnant of the core collapse of a massive star. For each of six candidate equations of state, we use a very large suite of spherically symmetric supernova models to generate a suite of synthetic populations of such remnants. We then compare these synthetic populations to the observed remnant population. We thus provide a novel constraint on the high-temperature nuclear equation of state and describe which EOS candidates are more or less favored by this metric.
△ Less
Submitted 27 March, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Adaptive Dynamic Sliding Mode Control of Soft Continuum Manipulators
Authors:
Amirhossein Kazemipour,
Oliver Fischer,
Yasunori Toshimitsu,
Ki Wan Wong,
Robert K. Katzschmann
Abstract:
Soft robots are made of compliant materials and perform tasks that are challenging for rigid robots. However, their continuum nature makes it difficult to develop model-based control strategies. This work presents a robust model-based control scheme for soft continuum robots. Our dynamic model is based on the Euler-Lagrange approach, but it uses a more accurate description of the robot's inertia a…
▽ More
Soft robots are made of compliant materials and perform tasks that are challenging for rigid robots. However, their continuum nature makes it difficult to develop model-based control strategies. This work presents a robust model-based control scheme for soft continuum robots. Our dynamic model is based on the Euler-Lagrange approach, but it uses a more accurate description of the robot's inertia and does not include oversimplified assumptions. Based on this model, we introduce an adaptive sliding mode control scheme, which is robust against model parameter uncertainties and unknown input disturbances. We perform a series of experiments with a physical soft continuum arm to evaluate the effectiveness of our controller at tracking task-space trajectory under different payloads. The tracking performance of the controller is around 38\% more accurate than that of a state-of-the-art controller, i.e., the inverse dynamics method. Moreover, the proposed model-based control design is flexible and can be generalized to any continuum robotic arm with an arbitrary number of segments. With this control strategy, soft robotic object manipulation can become more accurate while remaining robust to disturbances.
△ Less
Submitted 26 February, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence
Authors:
Francisco Villaescusa-Navarro,
Shy Genel,
Daniel Angles-Alcazar,
Leander Thiele,
Romeel Dave,
Desika Narayanan,
Andrina Nicola,
Yin Li,
Pablo Villanueva-Domingo,
Benjamin Wandelt,
David N. Spergel,
Rachel S. Somerville,
Jose Manuel Zorrilla Matilla,
Faizan G. Mohammad,
Sultan Hassan,
Helen Shao,
Digvijay Wadekar,
Michael Eickenberg,
Kaze W. K. Wong,
Gabriella Contardo,
Yongseok Jo,
Emily Moser,
Erwin T. Lau,
Luis Fernando Machado Poletti Valle,
Lucia A. Perez
, et al. (3 additional authors not shown)
Abstract:
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light year…
▽ More
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Hunting intermediate-mass black holes with LISA binary radial velocity measurements
Authors:
Vladimir Strokov,
Giacomo Fragione,
Kaze W. K. Wong,
Thomas Helfer,
Emanuele Berti
Abstract:
Despite their potential role as massive seeds for quasars, in dwarf galaxy feedback, and in tidal disruption events, the observational evidence for intermediate-mass black holes (IMBHs) is scarce. LISA may observe stellar-mass black hole binaries orbiting Galactic IMBHs, and reveal the presence of the IMBH by measuring the Doppler shift in the gravitational waveform induced by the binary's radial…
▽ More
Despite their potential role as massive seeds for quasars, in dwarf galaxy feedback, and in tidal disruption events, the observational evidence for intermediate-mass black holes (IMBHs) is scarce. LISA may observe stellar-mass black hole binaries orbiting Galactic IMBHs, and reveal the presence of the IMBH by measuring the Doppler shift in the gravitational waveform induced by the binary's radial velocity. We estimate the number of detectable Doppler shift events from the Milky Way globular clusters (assuming they host IMBHs) and we find that it decreases with the IMBH mass. A few Galactic globular clusters (including M22 and $ω$ Centauri) may produce at least one event detectable by LISA. Even in more pessimistic scenarios, one could still expect $\sim$ 1 event overall in the Milky Way. We also estimate the number of Doppler shift events for IMBHs wandering in the Milky Way as a result of the disruption of their parent clusters. If there is at least one binary black hole orbiting around each wandering IMBH, LISA may detect up to a few tens of Doppler shift events from this elusive IMBH population. Under more pessimistic assumptions, LISA may still detect $\sim 1$ wandering IMBH that would hardly be observable otherwise.
△ Less
Submitted 27 April, 2022; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Projected State-action Balancing Weights for Offline Reinforcement Learning
Authors:
Jiayi Wang,
Zhengling Qi,
Raymond K. W. Wong
Abstract:
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and t…
▽ More
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights and show that the proposed value estimator is semi-parametric efficient under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we develop a necessary and sufficient condition for establishing the well-posedness of the Bellman operator in the off-policy setting, which characterizes the difficulty of OPE and may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.
△ Less
Submitted 9 June, 2022; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Implicit Sparse Regularization: The Impact of Depth and Early Stopping
Authors:
Jiangyuan Li,
Thanh V. Nguyen,
Chinmay Hegde,
Raymond K. W. Wong
Abstract:
In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon…
▽ More
In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter N, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window so that this implicit sparse regularization effect is more likely to take place.
△ Less
Submitted 26 October, 2021; v1 submitted 12 August, 2021;
originally announced August 2021.
-
Discriminating between different scenarios for the formation and evolution of massive black holes with LISA
Authors:
Alexandre Toubiana,
Kaze W. K. Wong,
Stanislav Babak,
Enrico Barausse,
Emanuele Berti,
Jonathan R. Gair,
Sylvain Marsat,
Stephen R. Taylor
Abstract:
Electromagnetic observations have provided strong evidence for the existence of massive black holes in the center of galaxies, but their origin is still poorly known. Different scenarios for the formation and evolution of massive black holes lead to different predictions for their properties and merger rates. LISA observations of coalescing massive black hole binaries could be used to reverse engi…
▽ More
Electromagnetic observations have provided strong evidence for the existence of massive black holes in the center of galaxies, but their origin is still poorly known. Different scenarios for the formation and evolution of massive black holes lead to different predictions for their properties and merger rates. LISA observations of coalescing massive black hole binaries could be used to reverse engineer the problem and shed light on these mechanisms. In this paper, we introduce a pipeline based on hierarchical Bayesian inference to infer the mixing fraction between different theoretical models by comparing them to LISA observations of massive black hole mergers. By testing this pipeline against simulated LISA data, we show that it allows us to accurately infer the properties of the massive black hole population as long as our theoretical models provide a reliable description of the Universe. We also show that measurement errors, including both instrumental noise and weak lensing errors, have little impact on the inference.
△ Less
Submitted 28 October, 2021; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Matrix Completion with Model-free Weighting
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Xiaojun Mao,
Kwun Chuen Gary Chan
Abstract:
In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based…
▽ More
In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based on the proposed weighted empirical risk enjoys appealing theoretical guarantees. In particular, the proposed method achieves a stronger guarantee than existing work in terms of the scaling with respect to the observation probabilities, under asymptotically heterogeneous missing settings (where entry-wise observation probabilities can be of different orders). These settings can be regarded as a better theoretical model of missing patterns with highly varying probabilities. We also provide a new minimax lower bound under a class of heterogeneous settings. Numerical experiments are also provided to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Looking for the parents of LIGO's black holes
Authors:
Vishal Baibhav,
Emanuele Berti,
Davide Gerosa,
Matthew Mould,
Kaze W. K. Wong
Abstract:
Solutions to the two-body problem in general relativity allow us to predict the mass, spin and recoil velocity of a black-hole merger remnant given the masses and spins of its binary progenitors. In this paper we address the inverse problem: given a binary black-hole merger, can we use the parameters measured by gravitational-wave interferometers to tell if the binary components are of hierarchica…
▽ More
Solutions to the two-body problem in general relativity allow us to predict the mass, spin and recoil velocity of a black-hole merger remnant given the masses and spins of its binary progenitors. In this paper we address the inverse problem: given a binary black-hole merger, can we use the parameters measured by gravitational-wave interferometers to tell if the binary components are of hierarchical origin, i.e. if they are themselves remnants of previous mergers? If so, can we determine at least some of the properties of their parents? This inverse problem is in general overdetermined. We show that hierarchical mergers occupy a characteristic region in the plane composed of the effective spin parameters $χ_{\rm eff}$ and $χ_{\rm p}$, and therefore a measurement of these parameters can add weight to the hierarchical-merger interpretation of some gravitational-wave events, including GW190521. If one of the binary components has hierarchical origin and its spin magnitude is well measured, we derive exclusion regions on the properties of its parents: for example we infer that the parents of GW190412 (if hierarchical) must have had unequal masses and low spins. Our formalism is quite general, and it can be used to infer constraints on the astrophysical environment producing hierarchical mergers.
△ Less
Submitted 12 October, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Searching for a subpopulation of primordial black holes in LIGO/Virgo gravitational-wave data
Authors:
Gabriele Franciolini,
Vishal Baibhav,
Valerio De Luca,
Ken K. Y. Ng,
Kaze W. K. Wong,
Emanuele Berti,
Paolo Pani,
Antonio Riotto,
Salvatore Vitale
Abstract:
With several dozen binary black hole events detected by LIGO/Virgo to date and many more expected in the next few years, gravitational-wave astronomy is shifting from individual-event analyses to population studies. Using the GWTC-2 catalog, we perform a hierarchical Bayesian analysis that for the first time combines several state-of-the-art astrophysical formation models with a population of prim…
▽ More
With several dozen binary black hole events detected by LIGO/Virgo to date and many more expected in the next few years, gravitational-wave astronomy is shifting from individual-event analyses to population studies. Using the GWTC-2 catalog, we perform a hierarchical Bayesian analysis that for the first time combines several state-of-the-art astrophysical formation models with a population of primordial black holes (PBHs) and constrains the fraction of a putative subpopulation of PBHs in the data. We find that this fraction depends significantly on the set of assumed astrophysical models. While a primordial population is statistically favored against certain competitive astrophysical channels, such as globular clusters and nuclear stellar clusters, a dominant contribution from the stable-mass-transfer isolated formation channel drastically reduces the need for PBHs, except for explaining the rate of mass-gap events like GW190521. The tantalizing possibility that black holes formed after inflation are contributing to LIGO/Virgo observations could only be verified by further reducing uncertainties in astrophysical and primordial formation models, and it may ultimately be confirmed by third-generation interferometers.
△ Less
Submitted 4 May, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
SoPrA: Fabrication & Dynamical Modeling of a Scalable Soft Continuum Robotic Arm with Integrated Proprioceptive Sensing
Authors:
Yasunori Toshimitsu,
Ki Wan Wong,
Thomas Buchner,
Robert Katzschmann
Abstract:
Due to their inherent compliance, soft robots are more versatile than rigid linked robots when they interact with their environment, such as object manipulation or biomimetic motion, and considered the key element in introducing robots to everyday environments. Although various soft robotic actuators exist, past research has focused primarily on designing and analyzing single components. Limited e…
▽ More
Due to their inherent compliance, soft robots are more versatile than rigid linked robots when they interact with their environment, such as object manipulation or biomimetic motion, and considered the key element in introducing robots to everyday environments. Although various soft robotic actuators exist, past research has focused primarily on designing and analyzing single components. Limited effort has been made to combine each component to create an overall capable, integrated soft robot. Ideally, the behavior of such a robot can be accurately modeled, and its motion within an environment uses its proprioception, without requiring external sensors. This work presents a design and modeling process for a Soft continuum Proprioceptive Arm (SoPrA) actuated by pneumatics. The integrated design is suitable for an analytical model due to its internal capacitive flex sensor for proprioceptive measurements and its fiber-reinforced fluidic elastomer actuators. The proposed analytical dynamical model accounts for the inertial effects of the actuator's mass and the material properties, and predicts in real-time the soft robot's behavior. Our estimation method integrates the analytical model with proprioceptive sensors to calculate external forces, all without relying on an external motion capture system. SoPrA is validated in a series of experiments demonstrating the model's and sensor's accuracy in estimation. SoPrA will enable soft arm manipulation including force sensing while operating in obstructed environments that disallows exteroceptive measurements.
△ Less
Submitted 6 August, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing
Authors:
Jiayi Wang,
Raymond K. W. Wong,
Shu Yang,
Kwun Chuen Gary Chan
Abstract:
We study nonparametric estimation for the partially conditional average treatment effect, defined as the treatment effect function over an interested subset of confounders. We propose a hybrid kernel weighting estimator where the weights aim to control the balancing error of any function of the confounders from a reproducing kernel Hilbert space after kernel smoothing over the subset of interested…
▽ More
We study nonparametric estimation for the partially conditional average treatment effect, defined as the treatment effect function over an interested subset of confounders. We propose a hybrid kernel weighting estimator where the weights aim to control the balancing error of any function of the confounders from a reproducing kernel Hilbert space after kernel smoothing over the subset of interested variables. In addition, we present an augmented version of our estimator which can incorporate estimations of outcome mean functions. Based on the representer theorem, gradient-based algorithms can be applied for solving the corresponding infinite-dimensional optimization problem. Asymptotic properties are studied without any smoothness assumptions for propensity score function or the need of data splitting, relaxing certain existing stringent assumptions. The numerical performance of the proposed estimator is demonstrated by a simulation study and an application to the effect of a mother's smoking on a baby's birth weight conditioned on the mother's age.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
The VIX index under scrutiny of machine learning techniques and neural networks
Authors:
Ali Hirsa,
Joerg Osterrieder,
Branka Hadji Misheva,
Wenxin Cao,
Yiwen Fu,
Hanze Sun,
Kin Wai Wong
Abstract:
The CBOE Volatility Index, known by its ticker symbol VIX, is a popular measure of the market's expected volatility on the SP 500 Index, calculated and published by the Chicago Board Options Exchange (CBOE). It is also often referred to as the fear index or the fear gauge. The current VIX index value quotes the expected annualized change in the SP 500 index over the following 30 days, based on opt…
▽ More
The CBOE Volatility Index, known by its ticker symbol VIX, is a popular measure of the market's expected volatility on the SP 500 Index, calculated and published by the Chicago Board Options Exchange (CBOE). It is also often referred to as the fear index or the fear gauge. The current VIX index value quotes the expected annualized change in the SP 500 index over the following 30 days, based on options-based theory and current options-market data. Despite its theoretical foundation in option price theory, CBOE's Volatility Index is prone to inadvertent and deliberate errors because it is weighted average of out-of-the-money calls and puts which could be illiquid. Many claims of market manipulation have been brought up against VIX in recent years.
This paper discusses several approaches to replicate the VIX index as well as VIX futures by using a subset of relevant options as well as neural networks that are trained to automatically learn the underlying formula. Using subset selection approaches on top of the original CBOE methodology, as well as building machine learning and neural network models including Random Forests, Support Vector Machines, feed-forward neural networks, and long short-term memory (LSTM) models, we will show that a small number of options is sufficient to replicate the VIX index. Once we are able to actually replicate the VIX using a small number of SP options we will be able to exploit potential arbitrage opportunities between the VIX index and its underlying derivatives. The results are supposed to help investors to better understand the options market, and more importantly, to give guidance to the US regulators and CBOE that have been investigating those manipulation claims for several years.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Joint constraints on the field-cluster mixing fraction, common envelope efficiency, and globular cluster radii from a population of binary hole mergers via deep learning
Authors:
Kaze W. K. Wong,
Katelyn Breivik,
Kyle Kremer,
Thomas Callister
Abstract:
The recent release of the second Gravitational-Wave Transient Catalog (GWTC-2) has increased significantly the number of known GW events, enabling unprecedented constraints on formation models of compact binaries. One pressing question is to understand the fraction of binaries originating from different formation channels, such as isolated field formation versus dynamical formation in dense stella…
▽ More
The recent release of the second Gravitational-Wave Transient Catalog (GWTC-2) has increased significantly the number of known GW events, enabling unprecedented constraints on formation models of compact binaries. One pressing question is to understand the fraction of binaries originating from different formation channels, such as isolated field formation versus dynamical formation in dense stellar clusters. In this paper, we combine the $\texttt{COSMIC}$ binary population synthesis suite and the $\texttt{CMC}$ code for globular cluster evolution to create a mixture model for black hole binary formation under both formation scenarios. For the first time, these code bodies are combined self-consistently, with $\texttt{CMC}$ itself employing $\texttt{COSMIC}$ to track stellar evolution. We then use a deep-learning enhanced hierarchical Bayesian analysis to constrain the mixture fraction $f$ between formation models, while simultaneously constraining the common envelope efficiency $α$ assumed in $\texttt{COSMIC}$ and the initial cluster virial radius $r_v$ assumed in $\texttt{CMC}$. Under specific assumptions about other uncertain aspects of isolated binary and globular cluster evolution, we report the median and $90\%$ confidence interval of three physical parameters $(f,α,r_v)=(0.20^{+0.32}_{-0.18},2.26^{+2.65}_{-1.84},2.71^{+0.83}_{-1.17})$. This simultaneous constraint agrees with observed properties of globular clusters in the Milky Way and is an important first step in the pathway toward learning astrophysics of compact binary formation from GW observations.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
Constraining the primordial black hole scenario with Bayesian inference and machine learning: the GWTC-2 gravitational wave catalog
Authors:
Kaze W. K. Wong,
Gabriele Franciolini,
Valerio De Luca,
Vishal Baibhav,
Emanuele Berti,
Paolo Pani,
Antonio Riotto
Abstract:
Primordial black holes (PBHs) might be formed in the early Universe and could comprise at least a fraction of the dark matter. Using the recently released GWTC-2 dataset from the third observing run of the LIGO-Virgo Collaboration, we investigate whether current observations are compatible with the hypothesis that all black hole mergers detected so far are of primordial origin. We constrain PBH fo…
▽ More
Primordial black holes (PBHs) might be formed in the early Universe and could comprise at least a fraction of the dark matter. Using the recently released GWTC-2 dataset from the third observing run of the LIGO-Virgo Collaboration, we investigate whether current observations are compatible with the hypothesis that all black hole mergers detected so far are of primordial origin. We constrain PBH formation models within a hierarchical Bayesian inference framework based on deep learning techniques, finding best-fit values for distinctive features of these models, including the PBH initial mass function, the fraction of PBHs in dark matter, and the accretion efficiency. The presence of several spinning binaries in the GWTC-2 dataset favors a scenario in which PBHs accrete and spin up. Our results indicate that PBHs may comprise only a fraction smaller than $0.3 \%$ of the total dark matter, and that the predicted PBH abundance is still compatible with other constraints.
△ Less
Submitted 11 January, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.