-
Estimating Probability Densities with Transformer and Denoising Diffusion
Authors:
Henry W. Leung,
Jo Bovy,
Joshua S. Speagle
Abstract:
Transformers are often the go-to architecture to build foundation models that ingest a large amount of training data. But these models do not estimate the probability density distribution when trained on regression problems, yet obtaining full probabilistic outputs is crucial to many fields of science, where the probability distribution of the answer can be non-Gaussian and multimodal. In this wor…
▽ More
Transformers are often the go-to architecture to build foundation models that ingest a large amount of training data. But these models do not estimate the probability density distribution when trained on regression problems, yet obtaining full probabilistic outputs is crucial to many fields of science, where the probability distribution of the answer can be non-Gaussian and multimodal. In this work, we demonstrate that training a probabilistic model using a denoising diffusion head on top of the Transformer provides reasonable probability density estimation even for high-dimensional inputs. The combined Transformer+Denoising Diffusion model allows conditioning the output probability density on arbitrary combinations of inputs and it is thus a highly flexible density function emulator of all possible input/output combinations. We illustrate our Transformer+Denoising Diffusion model by training it on a large dataset of astronomical observations and measured labels of stars within our Galaxy and we apply it to a variety of inference tasks to show that the model can infer labels accurately with reasonable distributions.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Decoding the age-chemical structure of the Milky Way disk: An application of Copulas and Elicitable Maps
Authors:
Aarya A. Patil,
Jo Bovy,
Sebastian Jaimungal,
Neige Frankel,
Henry W. Leung
Abstract:
In the Milky Way, the distribution of stars in the $[α/\mathrm{Fe}]$ vs. $[\mathrm{Fe/H}]$ and $[\mathrm{Fe/H}]$ vs. age planes holds essential information about the history of star formation, accretion, and dynamical evolution of the Galactic disk. We investigate these planes by applying novel statistical methods called copulas and elicitable maps to the ages and abundances of red giants in the A…
▽ More
In the Milky Way, the distribution of stars in the $[α/\mathrm{Fe}]$ vs. $[\mathrm{Fe/H}]$ and $[\mathrm{Fe/H}]$ vs. age planes holds essential information about the history of star formation, accretion, and dynamical evolution of the Galactic disk. We investigate these planes by applying novel statistical methods called copulas and elicitable maps to the ages and abundances of red giants in the APOGEE survey. We find that the low- and high-$α$ disk stars have a clean separation in copula space and use this to provide an automated separation of the $α$ sequences using a purely statistical approach. This separation reveals that the high-$α$ disk ends at the same [$α$/Fe] and age at high $[\mathrm{Fe/H}]$ as the low-$[\mathrm{Fe/H}]$ start of the low-$α$ disk, thus supporting a sequential formation scenario for the high- and low-$α$ disks. We then combine copulas with elicitable maps to precisely obtain the correlation between stellar age $τ$ and metallicity $[\mathrm{Fe/H}]$ conditional on Galactocentric radius $R$ and height $z$ in the range $0 < R < 20$ kpc and $|z| < 2$ kpc. The resulting trends in the age-metallicity correlation with radius, height, and [$α$/Fe] demonstrate a $\approx 0$ correlation wherever kinematically-cold orbits dominate, while the naively-expected negative correlation is present where kinematically-hot orbits dominate. This is consistent with the effects of spiral-driven radial migration, which must be strong enough to completely flatten the age-metallicity structure of the low-$α$ disk.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Functional Data Analysis for Extracting the Intrinsic Dimensionality of Spectra: Application to Chemical Homogeneity in the Open Cluster M67
Authors:
Aarya A. Patil,
Jo Bovy,
Gwendolyn Eadie,
Sebastian Jaimungal
Abstract:
High-resolution spectroscopic surveys of the Milky Way have entered the Big Data regime and have opened avenues for solving outstanding questions in Galactic archaeology. However, exploiting their full potential is limited by complex systematics, whose characterization has not received much attention in modern spectroscopic analyses. In this work, we present a novel method to disentangle the compo…
▽ More
High-resolution spectroscopic surveys of the Milky Way have entered the Big Data regime and have opened avenues for solving outstanding questions in Galactic archaeology. However, exploiting their full potential is limited by complex systematics, whose characterization has not received much attention in modern spectroscopic analyses. In this work, we present a novel method to disentangle the component of spectral data space intrinsic to the stars from that due to systematics. Using functional principal component analysis on a sample of $18,933$ giant spectra from APOGEE, we find that the intrinsic structure above the level of observational uncertainties requires ${\approx}$10 functional principal components (FPCs). Our FPCs can reduce the dimensionality of spectra, remove systematics, and impute masked wavelengths, thereby enabling accurate studies of stellar populations. To demonstrate the applicability of our FPCs, we use them to infer stellar parameters and abundances of 28 giants in the open cluster M67. We employ Sequential Neural Likelihood, a simulation-based Bayesian inference method that learns likelihood functions using neural density estimators, to incorporate non-Gaussian effects in spectral likelihoods. By hierarchically combining the inferred abundances, we limit the spread of the following elements in M67: $\mathrm{Fe} \lesssim 0.02$ dex; $\mathrm{C} \lesssim 0.03$ dex; $\mathrm{O}, \mathrm{Mg}, \mathrm{Si}, \mathrm{Ni} \lesssim 0.04$ dex; $\mathrm{Ca} \lesssim 0.05$ dex; $\mathrm{N}, \mathrm{Al} \lesssim 0.07$ dex (at 68% confidence). Our constraints suggest a lack of self-pollution by core-collapse supernovae in M67, which has promising implications for the future of chemical tagging to understand the star formation history and dynamical evolution of the Milky Way.
△ Less
Submitted 7 January, 2022; v1 submitted 22 September, 2021;
originally announced September 2021.
-
LRP2020: Astrostatistics in Canada
Authors:
Gwendolyn Eadie,
Arash Bahramian,
Pauline Barmby,
Radu Craiu,
Derek Bingham,
Renée Hložek,
JJ Kavelaars,
David Stenning,
Samantha Benincasa,
Guillaume Thomas,
Karun Thanjavur,
Jo Bovy,
Jan Cami,
Ray Carlberg,
Sam Lawler,
Adrian Liu,
Henry Ngo,
Mubdi Rahman,
Michael Rupen
Abstract:
(Abridged from Executive Summary) This white paper focuses on the interdisciplinary fields of astrostatistics and astroinformatics, in which modern statistical and computational methods are applied to and developed for astronomical data. Astrostatistics and astroinformatics have grown dramatically in the past ten years, with international organizations, societies, conferences, workshops, and summe…
▽ More
(Abridged from Executive Summary) This white paper focuses on the interdisciplinary fields of astrostatistics and astroinformatics, in which modern statistical and computational methods are applied to and developed for astronomical data. Astrostatistics and astroinformatics have grown dramatically in the past ten years, with international organizations, societies, conferences, workshops, and summer schools becoming the norm. Canada's formal role in astrostatistics and astroinformatics has been relatively limited, but there is a great opportunity and necessity for growth in this area. We conducted a survey of astronomers in Canada to gain information on the training mechanisms through which we learn statistical methods and to identify areas for improvement. In general, the results of our survey indicate that while astronomers see statistical methods as critically important for their research, they lack focused training in this area and wish they had received more formal training during all stages of education and professional development. These findings inform our recommendations for the LRP2020 on how to increase interdisciplinary connections between astronomy and statistics at the institutional, national, and international levels over the next ten years. We recommend specific, actionable ways to increase these connections, and discuss how interdisciplinary work can benefit not only research but also astronomy's role in training Highly Qualified Personnel (HQP) in Canada.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations
Authors:
Jo Bovy,
David W. Hogg,
Sam T. Roweis
Abstract:
We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual $d$-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying"…
▽ More
We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual $d$-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-merge" procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.
△ Less
Submitted 29 July, 2011; v1 submitted 19 May, 2009;
originally announced May 2009.