-
A clustered Gaussian process model for computer experiments
Authors:
Chih-Li Sung,
Benjamin Haaland,
Youngdeok Hwang,
Siyuan Lu
Abstract:
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is…
▽ More
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square errors than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters. An R package for the proposed methodology is provided in an open repository.
△ Less
Submitted 5 November, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Synthesizing simulation and field data of solar irradiance
Authors:
Furong Sun,
Robert B. Gramacy,
Benjamin Haaland,
Siyuan Lu,
Youngdeok Hwang
Abstract:
Predicting the intensity and amount of sunlight as a function of location and time is an essential component in identifying promising locations for economical solar farming. Although weather models and irradiance data are relatively abundant, these have yet, to our knowledge, been hybridized on a continental scale. Rather, much of the emphasis in the literature has been on short-term localized for…
▽ More
Predicting the intensity and amount of sunlight as a function of location and time is an essential component in identifying promising locations for economical solar farming. Although weather models and irradiance data are relatively abundant, these have yet, to our knowledge, been hybridized on a continental scale. Rather, much of the emphasis in the literature has been on short-term localized forecasting. This is probably because the amount of data involved in a more global analysis is prohibitive with the canonical toolkit, via the Gaussian process (GP). Here we show how GP surrogate and discrepancy models can be combined to tractably and accurately predict solar irradiance on time-aggregated and daily scales with measurements at thousands of sites across the continental United States. Our results establish short term accuracy of bias-corrected weather-based simulation of irradiance, when realizations are available in real space-time (e.g., in future days), and provide accurate surrogates for smoothing in the more common situation where reliable weather data is not available (e.g., in future years).
△ Less
Submitted 22 June, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Emulating satellite drag from large simulation experiments
Authors:
Furong Sun,
Robert B. Gramacy,
Benjamin Haaland,
Earl Lawrence,
Andrew Walker
Abstract:
Obtaining accurate estimates of satellite drag coefficients in low Earth orbit is a crucial component in positioning and collision avoidance. Simulators can produce accurate estimates, but their computational expense is much too large for real-time application. A pilot study showed that Gaussian process (GP) surrogate models could accurately emulate simulations. However, cubic runtime for training…
▽ More
Obtaining accurate estimates of satellite drag coefficients in low Earth orbit is a crucial component in positioning and collision avoidance. Simulators can produce accurate estimates, but their computational expense is much too large for real-time application. A pilot study showed that Gaussian process (GP) surrogate models could accurately emulate simulations. However, cubic runtime for training GPs means that they could only be applied to a narrow range of input configurations to achieve the desired level of accuracy. In this paper we show how extensions to the local approximate Gaussian Process (laGP) method allow accurate full-scale emulation. The new methodological contributions, which involve a multi-level global/local modeling approach, and a set-wise approach to local subset selection, are shown to perform well in benchmark and synthetic data settings. We conclude by demonstrating that our method achieves the desired level of accuracy, besting simpler viable (i.e., computationally tractable) global and local modeling approaches, when trained on seventy thousand core hours of drag simulations for two real-world satellites: the Hubble space telescope (HST) and the gravity recovery and climate experiment (GRACE).
△ Less
Submitted 22 June, 2019; v1 submitted 30 November, 2017;
originally announced December 2017.
-
Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments
Authors:
Chih-Li Sung,
Wenjia Wang,
Matthew Plumlee,
Benjamin Haaland
Abstract:
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative.…
▽ More
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input non-linear regression problems. An overlapping group lasso approach is used for estimation, ensuring computational feasibility in a large-scale and many-input setting. New results on consistency and inference for the (potentially overlapping) group lasso in a high-dimensional setting are developed and applied to the proposed multi-resolution functional ANOVA model. Importantly, these results allow us to quantify the uncertainty in our predictions. Numerical examples demonstrate that the proposed model enjoys marked computational advantages. Data capabilities, both in terms of sample size and dimension, meet or exceed best available emulation tools while meeting or exceeding emulation accuracy.
△ Less
Submitted 8 January, 2019; v1 submitted 20 September, 2017;
originally announced September 2017.
-
Potentially Predictive Variance Reducing Subsample Locations in Local Gaussian Process Regression
Authors:
Chih-Li Sung,
Robert B. Gramacy,
Benjamin Haaland
Abstract:
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley, 2015) was proposed as an accurate and computationally feasible emulation alternative. However, constructing local…
▽ More
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley, 2015) was proposed as an accurate and computationally feasible emulation alternative. However, constructing local sub-designs specific to predictions at a particular location of interest remains a substantial computational bottleneck to the technique. In this paper, two computationally efficient neighborhood search limiting techniques are proposed, a maximum distance method and a feature approximation method. Two examples demonstrate that the proposed methods indeed save substantial computation while retaining emulation accuracy.
△ Less
Submitted 26 November, 2016; v1 submitted 18 April, 2016;
originally announced April 2016.
-
Speeding up neighborhood search in local Gaussian process prediction
Authors:
Robert B. Gramacy,
Benjamin Haaland
Abstract:
Recent implementations of local approximate Gaussian process models have pushed computational boundaries for non-linear, non-parametric prediction problems, particularly when deployed as emulators for computer experiments. Their flavor of spatially independent computation accommodates massive parallelization, meaning that they can handle designs two or more orders of magnitude larger than previous…
▽ More
Recent implementations of local approximate Gaussian process models have pushed computational boundaries for non-linear, non-parametric prediction problems, particularly when deployed as emulators for computer experiments. Their flavor of spatially independent computation accommodates massive parallelization, meaning that they can handle designs two or more orders of magnitude larger than previously. However, accomplishing that feat can still require massive supercomputing resources. Here we aim to ease that burden. We study how predictive variance is reduced as local designs are built up for prediction. We then observe how the exhaustive and discrete nature of an important search subroutine involved in building such local designs may be overly conservative. Rather, we suggest that searching the space radially, i.e., continuously along rays emanating from the predictive location of interest, is a far thriftier alternative. Our empirical work demonstrates that ray-based search yields predictors with accuracy comparable to exhaustive search, but in a fraction of the time - bringing a supercomputer implementation back onto the desktop.
△ Less
Submitted 5 January, 2015; v1 submitted 29 August, 2014;
originally announced September 2014.