-
Diffusion Non-Additive Model for Multi-Fidelity Simulations with Tunable Precision
Authors:
Junoh Heo,
Romain Boutelet,
Chih-Li Sung
Abstract:
Computer simulations are indispensable for analyzing complex systems, yet high-fidelity models often incur prohibitive computational costs. Multi-fidelity frameworks address this challenge by combining inexpensive low-fidelity simulations with costly high-fidelity simulations to improve both accuracy and efficiency. However, certain scientific problems demand even more accurate results than the hi…
▽ More
Computer simulations are indispensable for analyzing complex systems, yet high-fidelity models often incur prohibitive computational costs. Multi-fidelity frameworks address this challenge by combining inexpensive low-fidelity simulations with costly high-fidelity simulations to improve both accuracy and efficiency. However, certain scientific problems demand even more accurate results than the highest-fidelity simulations available, particularly when a tuning parameter controlling simulation accuracy is available, but the exact solution corresponding to a zero-valued parameter remains out of reach. In this paper, we introduce the Diffusion Non-Additive (DNA) model, inspired by generative diffusion models, which captures nonlinear dependencies across fidelity levels using Gaussian process priors and extrapolates to the exact solution. The DNA model: (i) accommodates complex, non-additive relationships across fidelity levels; (ii) employs a nonseparable covariance kernel to model interactions between the tuning parameter and input variables, improving both predictive performance and physical interpretability; and (iii) provides closed-form expressions for the posterior predictive mean and variance, allowing efficient inference and uncertainty quantification. The methodology is validated on a suite of numerical studies and real-world case studies. An R package implementing the proposed methodology is available to support practical applications.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Active Learning for Finite Element Simulations with Adaptive Non-Stationary Kernel Function
Authors:
Romain Boutelet,
Chih-Li Sung
Abstract:
Simulating complex physical processes across a domain of input parameters can be very computationally expensive. Multi-fidelity surrogate modeling can resolve this issue by integrating cheaper simulations with the expensive ones in order to obtain better predictions at a reasonable cost. We are specifically interested in computer experiments that involve the use of finite element methods with a re…
▽ More
Simulating complex physical processes across a domain of input parameters can be very computationally expensive. Multi-fidelity surrogate modeling can resolve this issue by integrating cheaper simulations with the expensive ones in order to obtain better predictions at a reasonable cost. We are specifically interested in computer experiments that involve the use of finite element methods with a real-valued tuning parameter that determines the fidelity of the numerical output. In these cases, integrating this fidelity parameter in the analysis enables us to make inference on fidelity levels that have not been observed yet. Such models have been developed, and we propose a new adaptive non-stationary kernel function which more accurately reflects the behavior of computer simulation outputs. In addition, we aim to create a sequential design based on the integrated mean squared prediction error (IMSPE) to identify the best design points across input parameters and fidelity parameter, while taking into account the computational cost associated with the fidelity parameter. We illustrate this methodology through synthetic examples and applications to finite element analysis. An $\textsf{R}$ package for the proposed methodology is provided in an open repository.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Uncertainty-Aware Out-of-Distribution Detection with Gaussian Processes
Authors:
Yang Chen,
Chih-Li Sung,
Arpan Kusari,
Xiaoyang Song,
Wenbo Sun
Abstract:
Deep neural networks (DNNs) are often constructed under the closed-world assumption, which may fail to generalize to the out-of-distribution (OOD) data. This leads to DNNs producing overconfident wrong predictions and can result in disastrous consequences in safety-critical applications. Existing OOD detection methods mainly rely on curating a set of OOD data for model training or hyper-parameter…
▽ More
Deep neural networks (DNNs) are often constructed under the closed-world assumption, which may fail to generalize to the out-of-distribution (OOD) data. This leads to DNNs producing overconfident wrong predictions and can result in disastrous consequences in safety-critical applications. Existing OOD detection methods mainly rely on curating a set of OOD data for model training or hyper-parameter tuning to distinguish OOD data from training data (also known as in-distribution data or InD data). However, OOD samples are not always available during the training phase in real-world applications, hindering the OOD detection accuracy. To overcome this limitation, we propose a Gaussian-process-based OOD detection method to establish a decision boundary based on InD data only. The basic idea is to perform uncertainty quantification of the unconstrained softmax scores of a DNN via a multi-class Gaussian process (GP), and then define a score function to separate InD and potential OOD data based on their fundamental differences in the posterior predictive distribution from the GP. Two case studies on conventional image classification datasets and real-world image datasets are conducted to demonstrate that the proposed method outperforms the state-of-the-art OOD detection methods when OOD samples are not observed in the training phase.
△ Less
Submitted 30 December, 2024;
originally announced December 2024.
-
Active Learning for a Recursive Non-Additive Emulator for Multi-Fidelity Computer Experiments
Authors:
Junoh Heo,
Chih-Li Sung
Abstract:
Computer simulations have become essential for analyzing complex systems, but high-fidelity simulations often come with significant computational costs. To tackle this challenge, multi-fidelity computer experiments have emerged as a promising approach that leverages both low-fidelity and high-fidelity simulations, enhancing both the accuracy and efficiency of the analysis. In this paper, we introd…
▽ More
Computer simulations have become essential for analyzing complex systems, but high-fidelity simulations often come with significant computational costs. To tackle this challenge, multi-fidelity computer experiments have emerged as a promising approach that leverages both low-fidelity and high-fidelity simulations, enhancing both the accuracy and efficiency of the analysis. In this paper, we introduce a new and flexible statistical model, the Recursive Non-Additive (RNA) emulator, that integrates the data from multi-fidelity computer experiments. Unlike conventional multi-fidelity emulation approaches that rely on an additive auto-regressive structure, the proposed RNA emulator recursively captures the relationships between multi-fidelity data using Gaussian process priors without making the additive assumption, allowing the model to accommodate more complex data patterns. Importantly, we derive the posterior predictive mean and variance of the emulator, which can be efficiently computed in a closed-form manner, leading to significant improvements in computational efficiency. Additionally, based on this emulator, we introduce four active learning strategies that optimize the balance between accuracy and simulation costs to guide the selection of the fidelity level and input locations for the next simulation run. We demonstrate the effectiveness of the proposed approach in a suite of synthetic examples and a real-world problem. An R package RNAmf for the proposed methodology is provided on CRAN.
△ Less
Submitted 4 June, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Advancing inverse scattering with surrogate modeling and Bayesian inference for functional inputs
Authors:
Chih-Li Sung,
Yao Song,
Ying Hung
Abstract:
Inverse scattering aims to infer information about a hidden object by using the received scattered waves and training data collected from forward mathematical models. Recent advances in computing have led to increasing attention towards functional inverse inference, which can reveal more detailed properties of a hidden object. However, rigorous studies on functional inverse, including the reconstr…
▽ More
Inverse scattering aims to infer information about a hidden object by using the received scattered waves and training data collected from forward mathematical models. Recent advances in computing have led to increasing attention towards functional inverse inference, which can reveal more detailed properties of a hidden object. However, rigorous studies on functional inverse, including the reconstruction of the functional input and quantification of uncertainty, remain scarce. Motivated by an inverse scattering problem where the objective is to infer the functional input representing the refractive index of a bounded scatterer, a new Bayesian framework is proposed. It contains a surrogate model that takes into account the functional inputs directly through kernel functions, and a Bayesian procedure that infers functional inputs through the posterior distribution. Furthermore, the proposed Bayesian framework is extended to reconstruct functional inverse by integrating multi-fidelity simulations, including a high-fidelity simulator solved by finite element methods and a low-fidelity simulator called the Born approximation. When compared with existing alternatives developed by finite basis expansion, the proposed method provides more accurate functional recoveries with smaller prediction variations.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Mesh-clustered Gaussian process emulator for partial differential equation boundary value problems
Authors:
Chih-Li Sung,
Wenjia Wang,
Liang Ding,
Xingjian Wang
Abstract:
Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propos…
▽ More
Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as finite element methods, with solutions over the spatial domain. However, obtaining these solutions are often prohibitively costly, limiting the feasibility of exploring parameters in PDEs. In this paper, we propose an efficient emulator that simultaneously predicts the solutions over the spatial domain, with theoretical justification of its uncertainty quantification. The novelty of the proposed method lies in the incorporation of the mesh node coordinates into the statistical model. In particular, the proposed method segments the mesh nodes into multiple clusters via a Dirichlet process prior and fits Gaussian process models with the same hyperparameters in each of them. Most importantly, by revealing the underlying clustering structures, the proposed method can provide valuable insights into qualitative features of the resulting dynamics that can be used to guide further investigations. Real examples are demonstrated to show that our proposed method has smaller prediction errors than its main competitors, with competitive computation time, and identifies interesting clusters of mesh nodes that possess physical significance, such as satisfying boundary conditions. An R package for the proposed methodology is provided in an open repository.
△ Less
Submitted 14 February, 2024; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Stacking designs: designing multi-fidelity computer experiments with target predictive accuracy
Authors:
Chih-Li Sung,
Yi Ji,
Simon Mak,
Wenjia Wang,
Tao Tang
Abstract:
In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model ac…
▽ More
In an era where scientific experiments can be very costly, multi-fidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and thus wishes to (i) maximize predictive power of the multi-fidelity emulator via a careful design of experiments, and (ii) ensure this model achieves a desired error tolerance with some notion of confidence. Existing design methods, however, do not jointly tackle objectives (i) and (ii). We propose a novel stacking design approach that addresses both goals. A multi-level reproducing kernel Hilbert space (RKHS) interpolator is first introduced to build the emulator, under which our stacking design provides a sequential approach for designing multi-fidelity runs such that a desired prediction error of $ε> 0$ is met under regularity assumptions. We then prove a novel cost complexity theorem that, under this multi-level interpolator, establishes a bound on the computation cost (for training data simulation) needed to achieve a prediction bound of $ε$. This result provides novel insights on conditions under which the proposed multi-fidelity approach improves upon a conventional RKHS interpolator which relies on a single fidelity level. Finally, we demonstrate the effectiveness of stacking designs in a suite of simulation experiments and an application to finite element analysis.
△ Less
Submitted 27 October, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Functional-Input Gaussian Processes with Applications to Inverse Scattering Problems
Authors:
Chih-Li Sung,
Wenjia Wang,
Fioralba Cakoni,
Isaac Harris,
Ying Hung
Abstract:
Surrogate modeling based on Gaussian processes (GPs) has received increasing attention in the analysis of complex problems in science and engineering. Despite extensive studies on GP modeling, the developments for functional inputs are scarce. Motivated by an inverse scattering problem in which functional inputs representing the support and material properties of the scatterer are involved in the…
▽ More
Surrogate modeling based on Gaussian processes (GPs) has received increasing attention in the analysis of complex problems in science and engineering. Despite extensive studies on GP modeling, the developments for functional inputs are scarce. Motivated by an inverse scattering problem in which functional inputs representing the support and material properties of the scatterer are involved in the partial differential equations, a new class of kernel functions for functional inputs is introduced for GPs. Based on the proposed GP models, the asymptotic convergence properties of the resulting mean squared prediction errors are derived and the finite sample performance is demonstrated by numerical examples. In the application to inverse scattering, a surrogate model is constructed with functional inputs, which is crucial to recover the reflective index of an inhomogeneous isotropic scattering region of interest for a given far-field pattern.
△ Less
Submitted 3 January, 2023; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Overview and Introduction to Development of Non-Ergodic Earthquake Ground-Motion Models
Authors:
Grigorios Lavrentiadis,
Norman A. Abrahamson,
Kuehn M. Nicolas,
Yousef Bozorgnia,
Christine A. Goulet,
Anže Babič,
Jorge Macedo,
Matjaž Dolšek,
Nicholas Gregor,
Albert R. Kottke,
Maxime Lacour,
Chenying Liu,
Xiaofeng Meng,
Van-Bang Phung,
Chih-Hsuan Sung,
Melanie Walling
Abstract:
This paper provides an overview and introduction to the development of non-ergodic ground-motion models, GMMs. It is intended for a reader who is familiar with the standard approach for developing ergodic GMMs. It starts with a brief summary of the development of ergodic GMMs and then describes different methods that are used in the development of non-ergodic GMMs with an emphasis on Gaussian Proc…
▽ More
This paper provides an overview and introduction to the development of non-ergodic ground-motion models, GMMs. It is intended for a reader who is familiar with the standard approach for developing ergodic GMMs. It starts with a brief summary of the development of ergodic GMMs and then describes different methods that are used in the development of non-ergodic GMMs with an emphasis on Gaussian Process (GP) regression, as that is currently the method preferred by most researchers contributing to this special issue. Non-ergodic modeling requires the definition of locations for the source and site characterizing the systematic source and site effects; the non-ergodic domain is divided into cells for describing the systematic path effects. Modeling the cell-specific anelastic attenuation as a GP and considerations on constraints for extrapolation of the non-ergodic GMMs are also discussed. An updated unifying notation for non-ergodic GMMs is also presented, which has been adopted by the authors of this issue.
△ Less
Submitted 13 September, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Estimating functional parameters for understanding the impact of weather and government interventions on COVID-19 outbreak
Authors:
Chih-Li Sung
Abstract:
As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventio…
▽ More
As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventions on the virus spread in the presence of asymptomatic infections among eight metropolitan areas in the United States. The model uses Bayesian inference with Gaussian process priors to study the functional parameters nonparametrically, and sensitivity analysis is adopted to investigate the main and interaction effects of these factors. This analysis reveals several important results including the potential interaction effects between weather and government interventions, which shed new light on the effective strategies for policymakers to mitigate the COVID-19 outbreak.
△ Less
Submitted 10 May, 2022; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Efficient calibration for imperfect epidemic models with applications to the analysis of COVID-19
Authors:
Chih-Li Sung,
Ying Hung
Abstract:
The estimation of unknown parameters in simulations, also known as calibration, is crucial for practical management of epidemics and prediction of pandemic risk. A simple yet widely used approach is to estimate the parameters by minimizing the sum of the squared distances between actual observations and simulation outputs. It is shown in this paper that this method is inefficient, particularly whe…
▽ More
The estimation of unknown parameters in simulations, also known as calibration, is crucial for practical management of epidemics and prediction of pandemic risk. A simple yet widely used approach is to estimate the parameters by minimizing the sum of the squared distances between actual observations and simulation outputs. It is shown in this paper that this method is inefficient, particularly when the epidemic models are developed based on certain simplifications of reality, also known as imperfect models which are commonly used in practice. To address this issue, a new estimator is introduced that is asymptotically consistent, has a smaller estimation variance than the least squares estimator, and achieves the semiparametric efficiency. Numerical studies are performed to examine the finite sample performance. The proposed method is applied to the analysis of the COVID-19 pandemic for 20 countries based on the SEIR (Susceptible-Exposed-Infectious-Recovered) model with both deterministic and stochastic simulations. The estimation of the parameters, including the basic reproduction number and the average incubation period, reveal the risk of disease outbreaks in each country and provide insights to the design of public health interventions.
△ Less
Submitted 22 June, 2023; v1 submitted 26 September, 2020;
originally announced September 2020.
-
A Machine Learning System for Retaining Patients in HIV Care
Authors:
Avishek Kumar,
Arthi Ramachandran,
Adolfo De Unanue,
Christina Sung,
Joe Walsh,
John Schneider,
Jessica Ridgway,
Stephanie Masiello Schuette,
Jeff Lauritsen,
Rayid Ghani
Abstract:
Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current metho…
▽ More
Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51% of PLWH are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
A clustered Gaussian process model for computer experiments
Authors:
Chih-Li Sung,
Benjamin Haaland,
Youngdeok Hwang,
Siyuan Lu
Abstract:
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is…
▽ More
A Gaussian process has been one of the important approaches for emulating computer simulations. However, the stationarity assumption for a Gaussian process and the intractability for large-scale dataset limit its availability in practice. In this article, we propose a clustered Gaussian process model which segments the input data into multiple clusters, in each of which a Gaussian process model is performed. The stochastic expectation-maximization is employed to efficiently fit the model. In our simulations as well as a real application to solar irradiance emulation, our proposed method had smaller mean square errors than its main competitors, with competitive computation time, and provides valuable insights from data by discovering the clusters. An R package for the proposed methodology is provided in an open repository.
△ Less
Submitted 5 November, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Calibration of inexact computer models with heteroscedastic errors
Authors:
Chih-Li Sung,
Beau David Barber,
Berkley J. Walker
Abstract:
Computer models are commonly used to represent a wide range of real systems, but they often involve some unknown parameters. Estimating the parameters by collecting physical data becomes essential in many scientific fields, ranging from engineering to biology. However, most of the existing methods are developed under the assumption that the physical data contains homoscedastic measurement errors.…
▽ More
Computer models are commonly used to represent a wide range of real systems, but they often involve some unknown parameters. Estimating the parameters by collecting physical data becomes essential in many scientific fields, ranging from engineering to biology. However, most of the existing methods are developed under the assumption that the physical data contains homoscedastic measurement errors. Motivated by an experiment of plant relative growth rates where replicates are available, we propose a new calibration method for inexact computer models with heteroscedastic measurement errors. Asymptotic properties of the parameter estimators are derived, and a goodness-of-fit test is developed to detect the presence of heteroscedasticity. Numerical examples and empirical studies demonstrate that the proposed method not only yields accurate parameter estimation, but it also provides accurate predictions for physical data in the presence of both heteroscedasticity and model misspecification.
△ Less
Submitted 26 May, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Calibration for computer experiments with binary responses and application to cell adhesion study
Authors:
Chih-Li Sung,
Ying Hung,
William Rittase,
Cheng Zhu,
C. F. Jeff Wu
Abstract:
Calibration refers to the estimation of unknown parameters which are present in computer experiments but not available in physical experiments. An accurate estimation of these parameters is important because it provides a scientific understanding of the underlying system which is not available in physical experiments. Most of the work in the literature is limited to the analysis of continuous resp…
▽ More
Calibration refers to the estimation of unknown parameters which are present in computer experiments but not available in physical experiments. An accurate estimation of these parameters is important because it provides a scientific understanding of the underlying system which is not available in physical experiments. Most of the work in the literature is limited to the analysis of continuous responses. Motivated by a study of cell adhesion experiments, we propose a new calibration framework for binary responses. Its application to the T cell adhesion data provides insight into the unknown values of the kinetic parameters which are difficult to determine by physical experiments due to the limitation of the existing experimental techniques.
△ Less
Submitted 20 March, 2019; v1 submitted 4 June, 2018;
originally announced June 2018.
-
Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments
Authors:
Chih-Li Sung,
Wenjia Wang,
Matthew Plumlee,
Benjamin Haaland
Abstract:
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative.…
▽ More
The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input non-linear regression problems. An overlapping group lasso approach is used for estimation, ensuring computational feasibility in a large-scale and many-input setting. New results on consistency and inference for the (potentially overlapping) group lasso in a high-dimensional setting are developed and applied to the proposed multi-resolution functional ANOVA model. Importantly, these results allow us to quantify the uncertainty in our predictions. Numerical examples demonstrate that the proposed model enjoys marked computational advantages. Data capabilities, both in terms of sample size and dimension, meet or exceed best available emulation tools while meeting or exceeding emulation accuracy.
△ Less
Submitted 8 January, 2019; v1 submitted 20 September, 2017;
originally announced September 2017.
-
Distributed Training Large-Scale Deep Architectures
Authors:
Shang-Xuan Zou,
Chun-Yen Chen,
Jui-Lin Wu,
Chun-Nan Chou,
Chia-Chin Tsao,
Kuan-Chieh Tung,
Ting-Wei Lin,
Cheng-Lung Sung,
Edward Y. Chang
Abstract:
Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bot…
▽ More
Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.
△ Less
Submitted 10 August, 2017;
originally announced September 2017.
-
A generalized Gaussian process model for computer experiments with binary time series
Authors:
Chih-Li Sung,
Ying Hung,
William Rittase,
Cheng Zhu,
C. F. Jeff Wu
Abstract:
Non-Gaussian observations such as binary responses are common in some computer experiments. Motivated by the analysis of a class of cell adhesion experiments, we introduce a generalized Gaussian process model for binary responses, which shares some common features with standard GP models. In addition, the proposed model incorporates a flexible mean function that can capture different types of time…
▽ More
Non-Gaussian observations such as binary responses are common in some computer experiments. Motivated by the analysis of a class of cell adhesion experiments, we introduce a generalized Gaussian process model for binary responses, which shares some common features with standard GP models. In addition, the proposed model incorporates a flexible mean function that can capture different types of time series structures. Asymptotic properties of the estimators are derived, and an optimal predictor as well as its predictive distribution are constructed. Their performance is examined via two simulation studies. The methodology is applied to study computer simulations for cell adhesion experiments. The fitted model reveals important biological information in repeated cell bindings, which is not directly observable in lab experiments.
△ Less
Submitted 24 September, 2018; v1 submitted 6 May, 2017;
originally announced May 2017.
-
An efficient surrogate model for emulation and physics extraction of large eddy simulations
Authors:
Simon Mak,
Chih-Li Sung,
Xingjian Wang,
Shiang-Ting Yeh,
Yu-Hung Chang,
V. Roshan Joseph,
Vigor Yang,
C. F. Jeff Wu
Abstract:
In the quest for advanced propulsion and power-generation systems, high-fidelity simulations are too computationally expensive to survey the desired design space, and a new design methodology is needed that combines engineering physics, computer simulations and statistical modeling. In this paper, we propose a new surrogate model that provides efficient prediction and uncertainty quantification of…
▽ More
In the quest for advanced propulsion and power-generation systems, high-fidelity simulations are too computationally expensive to survey the desired design space, and a new design methodology is needed that combines engineering physics, computer simulations and statistical modeling. In this paper, we propose a new surrogate model that provides efficient prediction and uncertainty quantification of turbulent flows in swirl injectors with varying geometries, devices commonly used in many engineering applications. The novelty of the proposed method lies in the incorporation of known physical properties of the fluid flow as {simplifying assumptions} for the statistical model. In view of the massive simulation data at hand, which is on the order of hundreds of gigabytes, these assumptions allow for accurate flow predictions in around an hour of computation time. To contrast, existing flow emulators which forgo such simplications may require more computation time for training and prediction than is needed for conducting the simulation itself. Moreover, by accounting for coupling mechanisms between flow variables, the proposed model can jointly reduce prediction uncertainty and extract useful flow physics, which can then be used to guide further investigations.
△ Less
Submitted 26 May, 2017; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Potentially Predictive Variance Reducing Subsample Locations in Local Gaussian Process Regression
Authors:
Chih-Li Sung,
Robert B. Gramacy,
Benjamin Haaland
Abstract:
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley, 2015) was proposed as an accurate and computationally feasible emulation alternative. However, constructing local…
▽ More
Gaussian process models are commonly used as emulators for computer experiments. However, developing a Gaussian process emulator can be computationally prohibitive when the number of experimental samples is even moderately large. Local Gaussian process approximation (Gramacy and Apley, 2015) was proposed as an accurate and computationally feasible emulation alternative. However, constructing local sub-designs specific to predictions at a particular location of interest remains a substantial computational bottleneck to the technique. In this paper, two computationally efficient neighborhood search limiting techniques are proposed, a maximum distance method and a feature approximation method. Two examples demonstrate that the proposed methods indeed save substantial computation while retaining emulation accuracy.
△ Less
Submitted 26 November, 2016; v1 submitted 18 April, 2016;
originally announced April 2016.