-
Correspondence of NNGP Kernel and the Matern Kernel
Authors:
Amanda Muyskens,
Benjamin W. Priest,
Imene R. Goumiri,
Michael D. Schneider
Abstract:
Kernels representing limiting cases of neural network architectures have recently gained popularity. However, the application and performance of these new kernels compared to existing options, such as the Matern kernel, is not well studied. We take a practical approach to explore the neural network Gaussian process (NNGP) kernel and its application to data in Gaussian process regression. We first…
▽ More
Kernels representing limiting cases of neural network architectures have recently gained popularity. However, the application and performance of these new kernels compared to existing options, such as the Matern kernel, is not well studied. We take a practical approach to explore the neural network Gaussian process (NNGP) kernel and its application to data in Gaussian process regression. We first demonstrate the necessity of normalization to produce valid NNGP kernels and explore related numerical challenges. We further demonstrate that the predictions from this model are quite inflexible, and therefore do not vary much over the valid hyperparameter sets. We then demonstrate a surprising result that the predictions given from the NNGP kernel correspond closely to those given by the Matern kernel under specific circumstances, which suggests a deep similarity between overparameterized deep neural networks and the Matern kernel. Finally, we demonstrate the performance of the NNGP kernel as compared to the Matern kernel on three benchmark data cases, and we conclude that for its flexibility and practical performance, the Matern kernel is preferred to the novel NNGP in practical applications.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Identifiability and Sensitivity Analysis of Kriging Weights for the Matern Kernel
Authors:
Amanda Muyskens,
Benjamin W. Priest,
Imene R. Goumiri,
Michael D. Schneider
Abstract:
Gaussian process (GP) models are effective non-linear models for numerous scientific applications. However, computation of their hyperparameters can be difficult when there is a large number of training observations (n) due to the O(n^3) cost of evaluating the likelihood function. Furthermore, non-identifiable hyperparameter values can induce difficulty in parameter estimation. Because of this, ma…
▽ More
Gaussian process (GP) models are effective non-linear models for numerous scientific applications. However, computation of their hyperparameters can be difficult when there is a large number of training observations (n) due to the O(n^3) cost of evaluating the likelihood function. Furthermore, non-identifiable hyperparameter values can induce difficulty in parameter estimation. Because of this, maximum likelihood estimation or Bayesian calibration is sometimes omitted and the hyperparameters are estimated with prediction-based methods such as a grid search using cross validation. Kriging, or prediction using a Gaussian process model, amounts to a weighted mean of the data, where training data close to the prediction location as determined by the form and hyperparameters of the kernel matrix are more highly weighted. Our analysis focuses on examination of the commonly utilized Matern covariance function, of which the radial basis function (RBF) kernel function is the infinity limit of the smoothness parameter. We first perform a collinearity analysis to motivate identifiability issues between the parameters of the Matern covariance function. We also demonstrate which of its parameters can be estimated using only the predictions. Considering the kriging weights for a fixed training data and prediction location as a function of the hyperparameters, we evaluate their sensitivities - as well as those of the predicted variance - with respect to said hyperparameters. We demonstrate the smoothness parameter nu is the most sensitive parameter in determining the kriging weights, particularly when the nugget parameter is small, indicating this is the most important parameter to estimate. Finally, we demonstrate the impact of our conclusions on performance and accuracy in a classification problem using a latent Gaussian process model with the hyperparameters selected via a grid search.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Light curve completion and forecasting using fast and scalable Gaussian processes (MuyGPs)
Authors:
Imène R. Goumiri,
Alec M. Dunton,
Amanda L. Muyskens,
Benjamin W. Priest,
Robert E. Armstrong
Abstract:
Temporal variations of apparent magnitude, called light curves, are observational statistics of interest captured by telescopes over long periods of time. Light curves afford the exploration of Space Domain Awareness (SDA) objectives such as object identification or pose estimation as latent variable inference problems. Ground-based observations from commercial off the shelf (COTS) cameras remain…
▽ More
Temporal variations of apparent magnitude, called light curves, are observational statistics of interest captured by telescopes over long periods of time. Light curves afford the exploration of Space Domain Awareness (SDA) objectives such as object identification or pose estimation as latent variable inference problems. Ground-based observations from commercial off the shelf (COTS) cameras remain inexpensive compared to higher precision instruments, however, limited sensor availability combined with noisier observations can produce gappy time-series data that can be difficult to model. These external factors confound the automated exploitation of light curves, which makes light curve prediction and extrapolation a crucial problem for applications. Traditionally, image or time-series completion problems have been approached with diffusion-based or exemplar-based methods. More recently, Deep Neural Networks (DNNs) have become the tool of choice due to their empirical success at learning complex nonlinear embeddings. However, DNNs often require large training data that are not necessarily available when looking at unique features of a light curve of a single satellite.
In this paper, we present a novel approach to predicting missing and future data points of light curves using Gaussian Processes (GPs). GPs are non-linear probabilistic models that infer posterior distributions over functions and naturally quantify uncertainty. However, the cubic scaling of GP inference and training is a major barrier to their adoption in applications. In particular, a single light curve can feature hundreds of thousands of observations, which is well beyond the practical realization limits of a conventional GP on a single machine. Consequently, we employ MuyGPs, a scalable framework for hyperparameter estimation of GP models that uses nearest neighbors sparsification and local cross-validation. MuyGPs...
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Star-Galaxy Image Separation with Computationally Efficient Gaussian Process Classification
Authors:
Amanda L. Muyskens,
Imène R. Goumiri,
Benjamin W. Priest,
Michael D. Schneider,
Robert E. Armstrong,
Jason M. Bernstein,
Ryan Dana
Abstract:
We introduce a novel method for discerning optical telescope images of stars from those of galaxies using Gaussian processes (GPs). Although applications of GPs often struggle in high-dimensional data modalities such as optical image classification, we show that a low-dimensional embedding of images into a metric space defined by the principal components of the data suffices to produce high-qualit…
▽ More
We introduce a novel method for discerning optical telescope images of stars from those of galaxies using Gaussian processes (GPs). Although applications of GPs often struggle in high-dimensional data modalities such as optical image classification, we show that a low-dimensional embedding of images into a metric space defined by the principal components of the data suffices to produce high-quality predictions from real large-scale survey data. We develop a novel method of GP classification hyperparameter training that scales approximately linearly in the number of image observations, which allows for application of GP models to large-size Hyper Suprime-Cam (HSC) Subaru Strategic Program data. In our experiments we evaluate the performance of a principal component analysis (PCA) embedded GP predictive model against other machine learning algorithms including a convolutional neural network and an image photometric morphology discriminator. Our analysis shows that our methods compare favorably with current methods in optical image classification while producing posterior distributions from the GP regression that can be used to quantify object classification uncertainty. We further describe how classification uncertainty can be used to efficiently parse large-scale survey imaging data to produce high-confidence object catalogs.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Star-Galaxy Separation via Gaussian Processes with Model Reduction
Authors:
Imène R. Goumiri,
Amanda L. Muyskens,
Michael D. Schneider,
Benjamin W. Priest,
Robert E. Armstrong
Abstract:
Modern cosmological surveys such as the Hyper Suprime-Cam (HSC) survey produce a huge volume of low-resolution images of both distant galaxies and dim stars in our own galaxy. Being able to automatically classify these images is a long-standing problem in astronomy and critical to a number of different scientific analyses. Recently, the challenge of "star-galaxy" classification has been approached…
▽ More
Modern cosmological surveys such as the Hyper Suprime-Cam (HSC) survey produce a huge volume of low-resolution images of both distant galaxies and dim stars in our own galaxy. Being able to automatically classify these images is a long-standing problem in astronomy and critical to a number of different scientific analyses. Recently, the challenge of "star-galaxy" classification has been approached with Deep Neural Networks (DNNs), which are good at learning complex nonlinear embeddings. However, DNNs are known to overconfidently extrapolate on unseen data and require a large volume of training images that accurately capture the data distribution to be considered reliable. Gaussian Processes (GPs), which infer posterior distributions over functions and naturally quantify uncertainty, haven't been a tool of choice for this task mainly because popular kernels exhibit limited expressivity on complex and high-dimensional data.
In this paper, we present a novel approach to the star-galaxy separation problem that uses GPs and reap their benefits while solving many of the issues traditionally affecting them for classification of high-dimensional celestial image data. After an initial filtering of the raw data of star and galaxy image cutouts, we first reduce the dimensionality of the input images by using a Principal Components Analysis (PCA) before applying GPs using a simple Radial Basis Function (RBF) kernel on the reduced data. Using this method, we greatly improve the accuracy of the classification over a basic application of GPs while improving the computational efficiency and scalability of the method.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Simultaneous feedback control of toroidal magnetic field and plasma current on MST using advanced programmable power supplies
Authors:
I. R. Goumiri,
K. J. McCollam,
A. A. Squitieri,
D. J. Holly,
J. S. Sarff,
S. P. Leblanc
Abstract:
Programmable control of the inductive electric field enables advanced operations of reversed-field pinch (RFP) plasmas in the Madison Symmetric Torus (MST) device and further develops the technical basis for ohmically heated fusion RFP plasmas. MST's poloidal and toroidal magnetic fields ($B_\text{p}$ and $B_\text{t}$) can be sourced by programmable power supplies (PPSs) based on integrated-gate b…
▽ More
Programmable control of the inductive electric field enables advanced operations of reversed-field pinch (RFP) plasmas in the Madison Symmetric Torus (MST) device and further develops the technical basis for ohmically heated fusion RFP plasmas. MST's poloidal and toroidal magnetic fields ($B_\text{p}$ and $B_\text{t}$) can be sourced by programmable power supplies (PPSs) based on integrated-gate bipolar transistors (IGBT). In order to provide real-time simultaneous control of both $B_\text{p}$ and $B_\text{t}$ circuits, a time-independent integrated model is developed. The actuators considered for the control are the $B_\text{p}$ and $B_\text{t}$ primary currents produced by the PPSs. The control system goal will be tracking two particular demand quantities that can be measured at the plasma surface ($r=a$): the plasma current, $I_\text{p} \sim B_\text{p}(a)$, and the RFP reversal parameter, $F\sim B_\text{t}(a)/Φ$, where $Φ$ is the toroidal flux in the plasma. The edge safety factor, $q(a)\propto B_t(a)$, tends to track $F$ but not identically. To understand the responses of $I_\text{p}$ and $F$ to the actuators and to enable systematic design of control algorithms, dedicated experiments are run in which the actuators are modulated, and a linearized dynamic data-driven model is generated using a system identification method. We perform a series of initial real-time experiments to test the designed feedback controllers and validate the derived model predictions. The feedback controllers show systematic improvements over simpler feedforward controllers.
△ Less
Submitted 12 August, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Quantum Machine Learning using Gaussian Processes with Performant Quantum Kernels
Authors:
Matthew Otten,
Imène R. Goumiri,
Benjamin W. Priest,
George F. Chapline,
Michael D. Schneider
Abstract:
Quantum computers have the opportunity to be transformative for a variety of computational tasks. Recently, there have been proposals to use the unsimulatably of large quantum devices to perform regression, classification, and other machine learning tasks with quantum advantage by using kernel methods. While unsimulatably is a necessary condition for quantum advantage in machine learning, it is no…
▽ More
Quantum computers have the opportunity to be transformative for a variety of computational tasks. Recently, there have been proposals to use the unsimulatably of large quantum devices to perform regression, classification, and other machine learning tasks with quantum advantage by using kernel methods. While unsimulatably is a necessary condition for quantum advantage in machine learning, it is not sufficient, as not all kernels are equally effective. Here, we study the use of quantum computers to perform the machine learning tasks of one- and multi-dimensional regression, as well as reinforcement learning, using Gaussian Processes. By using approximations of performant classical kernels enhanced with extra quantum resources, we demonstrate that quantum devices, both in simulation and on hardware, can perform machine learning tasks at least as well as, and many times better than, the classical inspiration. Our informed kernel design demonstrates a path towards effectively utilizing quantum devices for machine learning tasks.
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
Reinforcement Learning via Gaussian Processes with Neural Network Dual Kernels
Authors:
Imène R. Goumiri,
Benjamin W. Priest,
Michael D. Schneider
Abstract:
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, but do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernel…
▽ More
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, but do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernels exhibit limited expressivity on complex and high-dimensional data. Fortunately, recently discovered conjugate and neural tangent kernel functions encode the behavior of overparameterized neural networks in the kernel domain. We demonstrate that these kernels can be efficiently applied to regression and reinforcement learning problems by analyzing a baseline case study. We apply GPs with neural network dual kernels to solve reinforcement learning tasks for the first time. We demonstrate, using the well-understood mountain-car problem, that GPs empowered with dual kernels perform at least as well as those using the conventional radial basis function kernel. We conjecture that by inheriting the probabilistic rigor of GPs and the powerful embedding properties of DNNs, GPs using NN dual kernels will empower future reinforcement learning models on difficult domains.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.