-
Synthesizing data products, mathematical models, and observational measurements for lake temperature forecasting
Authors:
Maike F. Holthuijzen,
Robert B. Gramacy,
Cayelan C. Carey,
Dave M. Higdon,
R. Quinn Thomas
Abstract:
We present a novel forecasting framework for lake water temperature, which is crucial for managing lake ecosystems and drinking water resources. The General Lake Model (GLM) has been previously used for this purpose, but, similar to many process-based simulation models, it: requires a large number of inputs, many of which are stochastic; presents challenges for uncertainty quantification (UQ); and…
▽ More
We present a novel forecasting framework for lake water temperature, which is crucial for managing lake ecosystems and drinking water resources. The General Lake Model (GLM) has been previously used for this purpose, but, similar to many process-based simulation models, it: requires a large number of inputs, many of which are stochastic; presents challenges for uncertainty quantification (UQ); and can exhibit model bias. To address these issues, we propose a Gaussian process (GP) surrogate-based forecasting approach that efficiently handles large, high-dimensional data and accounts for input-dependent variability and systematic GLM bias. We validate the proposed approach and compare it with other forecasting methods, including a climatological model and raw GLM simulations. Our results demonstrate that our bias-corrected GP surrogate (GPBC) can outperform competing approaches in terms of forecast accuracy and UQ up to two weeks into the future.
△ Less
Submitted 25 February, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
A Note on Using Discretized Simulated Data to Estimate Implicit Likelihoods in Bayesian Analyses
Authors:
M. S. Hamada,
T. L. Graves,
N. W. Hengartner,
D. M. Higdon,
A. V. Huzurbazar,
E. C. Lawrence,
C. D. Linkletter,
C. S. Reese,
D. W. Scott,
R. R. Sitter,
R. L. Warr,
B. J. Williams
Abstract:
This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.
This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Discovery of Physics from Data: Universal Laws and Discrepancies
Authors:
Brian M. de Silva,
David M. Higdon,
Steven L. Brunton,
J. Nathan Kutz
Abstract:
Machine learning (ML) and artificial intelligence (AI) algorithms are now being used to automate the discovery of physics principles and governing equations from measurement data alone. However, positing a universal physical law from data is challenging without simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements. By revi…
▽ More
Machine learning (ML) and artificial intelligence (AI) algorithms are now being used to automate the discovery of physics principles and governing equations from measurement data alone. However, positing a universal physical law from data is challenging without simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements. By revisiting the classic problem of modeling falling objects of different size and mass, we highlight a number of nuanced issues that must be addressed by modern data-driven methods for automated physics discovery. Specifically, we show that measurement noise and complex secondary physical mechanisms, like unsteady fluid drag forces, can obscure the underlying law of gravitation, leading to an erroneous model. We use the sparse identification of nonlinear dynamics (SINDy) method to identify governing equations for real-world measurement data and simulated trajectories. Incorporating into SINDy the assumption that each falling object is governed by a similar physical law is shown to improve the robustness of the learned models, but discrepancies between the predictions and observations persist due to subtleties in drag dynamics. This work highlights the fact that the naive application of ML/AI will generally be insufficient to infer universal physical laws without further modification.
△ Less
Submitted 16 April, 2020; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Calibration of Computational Models with Categorical Parameters and Correlated Outputs via Bayesian Smoothing Spline ANOVA
Authors:
Curtis B. Storlie,
William A. Lane,
Emily M. Ryan,
James R. Gattiker,
David M. Higdon
Abstract:
It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some…
▽ More
It has become commonplace to use complex computer models to predict outcomes in regions where data does not exist. Typically these models need to be calibrated and validated using some experimental data, which often consists of multiple correlated outcomes. In addition, some of the model parameters may be categorical in nature, such as a pointer variable to alternate models (or submodels) for some of the physics of the system. Here we present a general approach for calibration in such situations where an emulator of the computationally demanding models and a discrepancy term from the model to reality are represented within a Bayesian Smoothing Spline (BSS) ANOVA framework. The BSS-ANOVA framework has several advantages over the traditional Gaussian Process, including ease of handling categorical inputs and correlated outputs, and improved computational efficiency. Finally this framework is then applied to the problem that motivated its design; a calibration of a computational fluid dynamics model of a bubbling fluidized which is used as an absorber in a CO2 capture system.
△ Less
Submitted 17 June, 2014; v1 submitted 21 May, 2014;
originally announced May 2014.
-
Parallel Bayesian Additive Regression Trees
Authors:
Matthew T. Pratola,
Hugh A. Chipman,
James R. Gattiker,
David M. Higdon,
Robert McCulloch,
William N. Rust
Abstract:
Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation acr…
▽ More
Bayesian Additive Regression Trees (BART) is a Bayesian approach to flexible non-linear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov Chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this paper we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
△ Less
Submitted 7 September, 2013;
originally announced September 2013.