-
Correspondence of NNGP Kernel and the Matern Kernel
Authors:
Amanda Muyskens,
Benjamin W. Priest,
Imene R. Goumiri,
Michael D. Schneider
Abstract:
Kernels representing limiting cases of neural network architectures have recently gained popularity. However, the application and performance of these new kernels compared to existing options, such as the Matern kernel, is not well studied. We take a practical approach to explore the neural network Gaussian process (NNGP) kernel and its application to data in Gaussian process regression. We first…
▽ More
Kernels representing limiting cases of neural network architectures have recently gained popularity. However, the application and performance of these new kernels compared to existing options, such as the Matern kernel, is not well studied. We take a practical approach to explore the neural network Gaussian process (NNGP) kernel and its application to data in Gaussian process regression. We first demonstrate the necessity of normalization to produce valid NNGP kernels and explore related numerical challenges. We further demonstrate that the predictions from this model are quite inflexible, and therefore do not vary much over the valid hyperparameter sets. We then demonstrate a surprising result that the predictions given from the NNGP kernel correspond closely to those given by the Matern kernel under specific circumstances, which suggests a deep similarity between overparameterized deep neural networks and the Matern kernel. Finally, we demonstrate the performance of the NNGP kernel as compared to the Matern kernel on three benchmark data cases, and we conclude that for its flexibility and practical performance, the Matern kernel is preferred to the novel NNGP in practical applications.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Identifiability and Sensitivity Analysis of Kriging Weights for the Matern Kernel
Authors:
Amanda Muyskens,
Benjamin W. Priest,
Imene R. Goumiri,
Michael D. Schneider
Abstract:
Gaussian process (GP) models are effective non-linear models for numerous scientific applications. However, computation of their hyperparameters can be difficult when there is a large number of training observations (n) due to the O(n^3) cost of evaluating the likelihood function. Furthermore, non-identifiable hyperparameter values can induce difficulty in parameter estimation. Because of this, ma…
▽ More
Gaussian process (GP) models are effective non-linear models for numerous scientific applications. However, computation of their hyperparameters can be difficult when there is a large number of training observations (n) due to the O(n^3) cost of evaluating the likelihood function. Furthermore, non-identifiable hyperparameter values can induce difficulty in parameter estimation. Because of this, maximum likelihood estimation or Bayesian calibration is sometimes omitted and the hyperparameters are estimated with prediction-based methods such as a grid search using cross validation. Kriging, or prediction using a Gaussian process model, amounts to a weighted mean of the data, where training data close to the prediction location as determined by the form and hyperparameters of the kernel matrix are more highly weighted. Our analysis focuses on examination of the commonly utilized Matern covariance function, of which the radial basis function (RBF) kernel function is the infinity limit of the smoothness parameter. We first perform a collinearity analysis to motivate identifiability issues between the parameters of the Matern covariance function. We also demonstrate which of its parameters can be estimated using only the predictions. Considering the kriging weights for a fixed training data and prediction location as a function of the hyperparameters, we evaluate their sensitivities - as well as those of the predicted variance - with respect to said hyperparameters. We demonstrate the smoothness parameter nu is the most sensitive parameter in determining the kriging weights, particularly when the nugget parameter is small, indicating this is the most important parameter to estimate. Finally, we demonstrate the impact of our conclusions on performance and accuracy in a classification problem using a latent Gaussian process model with the hyperparameters selected via a grid search.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Rare Events via Cross-Entropy Population Monte Carlo
Authors:
Caleb Miller,
Jem N. Corcoran,
Michael D. Schneider
Abstract:
We present a Cross-Entropy based population Monte Carlo algorithm. This methods stands apart from previous work in that we are not optimizing a mixture distribution. Instead, we leverage deterministic mixture weights and optimize the distributions individually through a reinterpretation of the typical derivation of the cross-entropy method. Demonstrations on numerical examples show that the algori…
▽ More
We present a Cross-Entropy based population Monte Carlo algorithm. This methods stands apart from previous work in that we are not optimizing a mixture distribution. Instead, we leverage deterministic mixture weights and optimize the distributions individually through a reinterpretation of the typical derivation of the cross-entropy method. Demonstrations on numerical examples show that the algorithm can outperform existing resampling population Monte Carlo methods, especially for higher-dimensional problems.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Bayesian Fusion of Data Partitioned Particle Estimates
Authors:
Caleb Miller,
Michael D. Schneider,
Jem N. Corcoran,
Jason Bernstein
Abstract:
We present a Bayesian data fusion method to approximate a posterior distribution from an ensemble of particle estimates that only have access to subsets of the data. Our approach relies on approximate probabilistic inference of model parameters through Monte Carlo methods, followed by an update and resample scheme related to multiple importance sampling to combine information from the initial esti…
▽ More
We present a Bayesian data fusion method to approximate a posterior distribution from an ensemble of particle estimates that only have access to subsets of the data. Our approach relies on approximate probabilistic inference of model parameters through Monte Carlo methods, followed by an update and resample scheme related to multiple importance sampling to combine information from the initial estimates. We show the method is convergent in the particle limit and directly suited to application on multi-sensor data fusion problems by demonstrating efficacy on a multi-sensor Keplerian orbit determination problem and a bearings-only tracking problem.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Reinforcement Learning via Gaussian Processes with Neural Network Dual Kernels
Authors:
Imène R. Goumiri,
Benjamin W. Priest,
Michael D. Schneider
Abstract:
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, but do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernel…
▽ More
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, but do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernels exhibit limited expressivity on complex and high-dimensional data. Fortunately, recently discovered conjugate and neural tangent kernel functions encode the behavior of overparameterized neural networks in the kernel domain. We demonstrate that these kernels can be efficiently applied to regression and reinforcement learning problems by analyzing a baseline case study. We apply GPs with neural network dual kernels to solve reinforcement learning tasks for the first time. We demonstrate, using the well-understood mountain-car problem, that GPs empowered with dual kernels perform at least as well as those using the conventional radial basis function kernel. We conjecture that by inheriting the probabilistic rigor of GPs and the powerful embedding properties of DNNs, GPs using NN dual kernels will empower future reinforcement learning models on difficult domains.
△ Less
Submitted 10 April, 2020;
originally announced April 2020.