-
Manifold-Constrained Gaussian Processes for Inference of Mixed-effects Ordinary Differential Equations with Application to Pharmacokinetics
Authors:
Yuxuan Zhao,
Samuel W. K. Wong
Abstract:
Pharmacokinetic modeling using ordinary differential equations (ODEs) has an important role in dose optimization studies, where dosing must balance sustained therapeutic efficacy with the risk of adverse side effects. Such ODE models characterize drug plasma concentration over time and allow pharmacokinetic parameters to be inferred, such as drug absorption and elimination rates. For time-course s…
▽ More
Pharmacokinetic modeling using ordinary differential equations (ODEs) has an important role in dose optimization studies, where dosing must balance sustained therapeutic efficacy with the risk of adverse side effects. Such ODE models characterize drug plasma concentration over time and allow pharmacokinetic parameters to be inferred, such as drug absorption and elimination rates. For time-course studies involving treatment groups with multiple subjects, mixed-effects ODE models are commonly used. However, existing methods tend to lack uncertainty quantification on a subject-level, for key measures such as peak or trough concentration and for making predictions of drug concentration. To address such limitations, we propose an extension of manifold-constrained Gaussian processes for inference of general mixed-effects ODE models within a Bayesian statistical framework. We evaluate our method on simulated examples, demonstrating its ability to provide fast and accurate inference for parameters and trajectories using nested optimization. To illustrate the practical efficacy of the proposed method, we provide a real data analysis of a pharmacokinetic model used for an HIV combination therapy study.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Fully Bayesian Sequential Design for Mean Response Surface Prediction of Heteroscedastic Stochastic Simulations
Authors:
Yuying Huang,
Samuel W. K. Wong
Abstract:
We present a fully Bayesian sequential strategy for predicting the mean response surface of heteroscedastic stochastic simulation functions. Leveraging dual Gaussian processes as the surrogate model and a criterion based on empirical expected integrated mean-square prediction error, our approach sequentially selects informative design points while fully accounting for parameter uncertainty. Sequen…
▽ More
We present a fully Bayesian sequential strategy for predicting the mean response surface of heteroscedastic stochastic simulation functions. Leveraging dual Gaussian processes as the surrogate model and a criterion based on empirical expected integrated mean-square prediction error, our approach sequentially selects informative design points while fully accounting for parameter uncertainty. Sequential importance sampling is employed to efficiently update the posterior distribution of the parameters. Our strategy is tailored for expensive simulation functions, where achieving robust predictive accuracy under a limited budget is critical. We illustrate its potential advantages compared to existing approaches through synthetic examples. We then implement the proposed strategy on a real motivating application in seismic design of wood-frame podium buildings.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Inference for Delay Differential Equations Using Manifold-Constrained Gaussian Processes
Authors:
Yuxuan Zhao,
Samuel W. K. Wong
Abstract:
Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system…
▽ More
Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system. We propose an extension of manifold-constrained Gaussian processes to conduct parameter inference for DDEs, whereas the time delay parameters have posed a challenge for existing methods that bypass numerical solvers. Our method uses a Bayesian framework to impose a Gaussian process model over the system trajectory, conditioned on the manifold constraint that satisfies the DDEs. For efficient computation, a linear interpolation scheme is developed to approximate the values of the time-delayed system outputs, along with corresponding theoretical error bounds on the approximated derivatives. Two simulation examples, based on Hutchinson's equation and the lac operon system, together with a real-world application using Ontario COVID-19 data, are used to illustrate the efficacy of our method.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach
Authors:
Shiyu He,
Samuel W. K. Wong
Abstract:
We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T…
▽ More
We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone "in situ" and "satellite" models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Particle Gibbs for Likelihood-Free Inference of State Space Models with Application to Stochastic Volatility
Authors:
Zhaoran Hou,
Samuel W. K. Wong
Abstract:
State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approxi…
▽ More
State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approximate Bayesian computation (ABC) setting. The proposed sampler incorporates a conditional auxiliary particle filter, which can help mitigate the weight degeneracy often encountered in ABC. To illustrate the methodology, we focus on a classic stochastic volatility model (SVM) used in finance and econometrics for analyzing and interpreting volatility. Simulation studies demonstrate the accuracy of our sampler for SVM parameter inference, compared to existing particle Gibbs samplers based on the conditional bootstrap filter. As a real data application, we apply the proposed sampler for fitting an SVM to S&P 500 Index time-series data during the 2008 financial crisis.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and Sector
Authors:
Shiyu He,
Trang Bui,
Yuying Huang,
Wenling Zhang,
Jie Jian,
Samuel W. K. Wong,
Tony S. Wirjanto
Abstract:
To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that r…
▽ More
To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that rising Fall temperature anomalies have a notable adverse impact on Canadian economic growth. Province-wide, Saskatchewan and Manitoba are anticipated to experience the most substantial declines, whereas British Columbia and the Maritime provinces will be less impacted. Industry-wide, Mining is projected to see the greatest benefits, while Agriculture and Manufacturing are projected to have the most significant downturns. The disparities of climate change effects between provinces and industries highlight the need for governments to tailor their policies accordingly, and offer targeted assistance to regions and industries that are particularly vulnerable in the face of climate change. Targeted approaches to climate change mitigation are likely to be more effective than one-size-fits-all policies for the whole economy.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations
Authors:
Mingwei Xu,
Samuel W. K. Wong,
Peijun Sang
Abstract:
Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met…
▽ More
Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these methods can be hindered by estimation error, especially when only sparse time-course observations are available. We present a Bayesian collocation framework that operates on the integrated form of the ODEs and also avoids the expensive use of numerical solvers. Our methodology has the capability to handle general nonlinear ODE systems. We demonstrate the accuracy of the proposed method through simulation studies, where the estimated parameters and recovered system trajectories are compared with other recent methods. A real data example is also provided.
△ Less
Submitted 23 October, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
A Kriging Metamodel with Adaptive Sampling for Seismic Evaluation of Podium Buildings
Authors:
Yuying Huang,
Zhiyong Chen,
Samuel W. K. Wong
Abstract:
In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel…
▽ More
In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel with tailored adaptive sampling methods to achieve this goal in a computationally efficient manner. The input variables we consider are the normalized stiffness ratio and the normalized mass ratio of the podium building. We took a six-story wood frame built upon a one-story concrete podium as a case study for our methodology, where our results indicate that the two-step analysis procedure may be used with high confidence if its normalized stiffness ratio is at least 38 and its normalized mass ratio is between 0.5 and 1.5.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Estimating and Assessing Differential Equation Models with Time-Course Data
Authors:
Samuel W. K. Wong,
Shihao Yang,
S. C. Kou
Abstract:
Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hi…
▽ More
Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hindered the widespread adoption of time-course analysis using ODEs. To address these challenges, we explore the efficacy of the recently developed MAGI (MAnifold-constrained Gaussian process Inference) method for ODE inference. First, via a range of examples we show that MAGI is capable of inferring the parameters and system trajectories, including unobserved components, with appropriate uncertainty quantification. Second, we illustrate how MAGI can be used to assess and select different ODE models with time-course data based on MAGI's efficient computation of model predictions. Overall, we believe MAGI is a useful method for the analysis of time-course data in the context of ODE models, which bypasses the need for any numerical integration.
△ Less
Submitted 13 February, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Estimating Boltzmann Averages for Protein Structural Quantities Using Sequential Monte Carlo
Authors:
Zhaoran Hou,
Samuel W. K. Wong
Abstract:
Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multip…
▽ More
Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multiple descendants for each particle, followed by resampling to maintain the desired number of particles. Simulation studies demonstrate the efficacy of the method for tackling the protein sampling problem. As a real data example, we use our method to estimate the number of atomic contacts for a key segment of the SARS-CoV-2 viral spike protein.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
A Comparative Study of Compartmental Models for COVID-19 Transmission in Ontario, Canada
Authors:
Yuxuan Zhao,
Samuel W. K. Wong
Abstract:
The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic tran…
▽ More
The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic transmission mechanisms and are easy to understand. Their performance in real-world settings, however, needs to be more thoroughly assessed. In this comparative study, we examine five compartmental models -- four existing ones and an extended model that we propose -- and analyze their ability to describe COVID-19 transmission in Ontario from January 2022 to June 2022.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
MAGI: A Package for Inference of Dynamic Systems from Noisy and Sparse Data via Manifold-constrained Gaussian Processes
Authors:
Samuel W. K. Wong,
Shihao Yang,
S. C. Kou
Abstract:
This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobser…
▽ More
This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobserved. MAGI solves this inference problem with the help of manifold-constrained Gaussian processes within a Bayesian statistical framework, whereas unobserved components have posed a significant challenge for existing software. We use several realistic examples to illustrate the functionality of MAGI. The user may choose to use the package in any of the R, MATLAB, and Python environments.
△ Less
Submitted 16 October, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant
Authors:
Samuel W. K. Wong
Abstract:
Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are pre…
▽ More
Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are presented as static snapshots that may not adequately depict this flexibility, and furthermore these cannot keep pace with new mutations and variants. In this paper we present a sequential Monte Carlo method for broadly sampling the 3-D conformational space of protein structure, according to the Boltzmann distribution of a given energy function. Our approach is distinct from previous sampling methods that focus on finding the lowest-energy conformation for predicting a single stable structure. We exemplify our method on the SARS-CoV-2 omicron variant as an application of timely interest. Our results identify sequence positions 495-508 as a key region where omicron mutations have the most impact on the space of possible conformations, which coincides with the findings of other preliminary studies on the binding properties of the omicron variant.
△ Less
Submitted 4 February, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Knots and their effect on the tensile strength of lumber: a case study
Authors:
Shuxian Fan,
Samuel W. K. Wong,
James V. Zidek
Abstract:
When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that gove…
▽ More
When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that govern knots are informed by subjective judgment to some extent, particularly the spatial interaction of knots and their relationship with lumber strength. This case study reports the results of an experiment that investigated and modelled the strength-reducing effects of knots on a sample of Douglas Fir lumber. Experimental data were obtained by taking scans of lumber surfaces and applying tensile strength testing. The modelling approach presented incorporates all relevant knot information in a Bayesian framework, thereby contributing a more refined way of managing the quality of manufactured lumber.
△ Less
Submitted 14 February, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Multimodel Bayesian Analysis of Load Duration Effects in Lumber Reliability
Authors:
Yunfeng Yang,
Martin Lysy,
Samuel W. K. Wong
Abstract:
This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also…
▽ More
This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also described. We propose Bayesian model-averaging (BMA) as a method for combining the reliability estimates of individual models under a given load profile that coherently accounts for statistical uncertainty in the choice of model and parameter values. The method is applied to the analysis of a Hemlock experimental dataset, where the BMA results are illustrated via estimated reliability indices together with 95% interval bands.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Conformational variability of loops in the SARS-CoV-2 spike protein
Authors:
Samuel W. K. Wong,
Zongjun Liu
Abstract:
The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had…
▽ More
The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had essentially one stable conformation, 17 of 44 loop regions were observed to be structurally variable with multiple substantively distinct conformations based on a cluster analysis. Loop modeling methods were then applied to the S protein loop targets, and the prediction accuracies discussed in relation to the characteristics of the conformational clusters identified. Loops with multiple conformations were found to be challenging to model based on a single structural template.
△ Less
Submitted 13 October, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia
Authors:
Geoffrey McGregor,
Jennifer Tippett,
Andy T. S. Wan,
Mengxiao Wang,
Samuel W. K. Wong
Abstract:
We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absen…
▽ More
We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absence of COVID-19 variants and vaccinations during this period, we examine the regionalized basic reproduction number, modelled prevalence, relative reduction in contact due to physical distancing, and proportion of anticipated cases that have been tested and reported. We observe significant differences between the regional and provincial-wide models and demonstrate the hierarchical regional model can better estimate regional prevalence, especially in rural regions. These results indicate that it can be useful to apply similar regional models to other parts of Canada or other countries.
△ Less
Submitted 13 November, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein
Authors:
Shiyu He,
Samuel W. K. Wong
Abstract:
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences i…
▽ More
As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.
△ Less
Submitted 30 January, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
Ellipse Detection and Localization with Applications to Knots in Sawn Lumber Images
Authors:
Shenyi Pan,
Shuxian Fan,
Samuel W. K. Wong,
James V. Zidek,
Helge Rhodin
Abstract:
While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tai…
▽ More
While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tailored to the elliptical shape and thereby improve on general purpose detectors; more generally, elliptical defects are common in industrial production, such as enclosed air bubbles when casting glass or plastic. In this paper, we adapt the Faster R-CNN with its Region Proposal Network (RPN) to model elliptical objects with a Gaussian function, and extend the existing Gaussian Proposal Network (GPN) architecture by adding the region-of-interest pooling and regression branches, as well as using the Wasserstein distance as the loss function to predict the precise locations of elliptical objects. Our proposed method has promising results on the lumber knot dataset: knots are detected with an average intersection over union of 73.05%, compared to 63.63% for general purpose detectors. Specific to the lumber application, we also propose an algorithm to correct any misalignment in the raw timber images during scanning, and contribute the first open-source lumber knot dataset by labeling the elliptical knots in the preprocessed images.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes
Authors:
Shihao Yang,
Samuel W. K. Wong,
S. C. Kou
Abstract:
Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that deri…
▽ More
Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments.
△ Less
Submitted 21 February, 2021; v1 submitted 15 September, 2020;
originally announced September 2020.
-
Assessing the impacts of mutations to the structure of COVID-19 spike protein via sequential Monte Carlo
Authors:
Samuel W. K. Wong
Abstract:
Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a d…
▽ More
Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low-energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this protein structure and complement laboratory efforts.
△ Less
Submitted 11 June, 2020; v1 submitted 1 May, 2020;
originally announced May 2020.
-
Calibrating wood products for load duration and rate: A statistical look at three damage models
Authors:
Samuel W. K. Wong
Abstract:
Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and…
▽ More
Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and Foschi accumulated damage models, and a degradation model based on the gamma process. We present a statistical framework for fitting these models to failure time data generated by a combination of ramp and constant load settings, and show how estimation uncertainty can be quantified. The models and methods are illustrated and compared via a novel analysis of a Hemlock lumber dataset. Practical usage of the fitted damage models is demonstrated with an application to long-term reliability prediction under stochastic future loadings.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification
Authors:
Katherine C. Kempfert,
Yishi Wang,
Cuixian Chen,
Samuel W. K. Wong
Abstract:
Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analy…
▽ More
Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets
Authors:
Katherine C. Kempfert,
Samuel W. K. Wong
Abstract:
For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the growing field of music information retrieval (MIR) has offered quantitative analyses to complement traditional qualitative analyses of these composers. In this MIR study, we classify the composer of Haydn and Mozart string quartets based on the content of their score…
▽ More
For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the growing field of music information retrieval (MIR) has offered quantitative analyses to complement traditional qualitative analyses of these composers. In this MIR study, we classify the composer of Haydn and Mozart string quartets based on the content of their scores. Our contribution is an interpretable statistical and machine learning approach that provides high classification accuracies and musical relevance. We develop novel global features that are automatically computed from symbolic data and informed by musicological Haydn-Mozart comparative studies, particularly relating to the sonata form. Several of these proposed features are found to be important for distinguishing between Haydn and Mozart string quartets. Our Bayesian logistic regression model attains leave-one-out classification accuracies over 84%, higher than prior works and providing interpretations that could aid in assessing musicological claims. Overall, our work can help expand the longstanding dialogue surrounding Haydn and Mozart and exemplify the benefit of interpretable machine learning in MIR, with potential applications to music generation and classification of other classical composers.
△ Less
Submitted 29 July, 2020; v1 submitted 13 September, 2018;
originally announced September 2018.
-
On the circular correlation coefficients for bivariate von Mises distributions on a torus
Authors:
Saptarshi Chakraborty,
Samuel W. K. Wong
Abstract:
This paper studies circular correlations for the bivariate von Mises sine and cosine distributions. These are two simple and appealing models for bivariate angular data with five parameters each that have interpretations comparable to those in the ordinary bivariate normal model. However, the variability and association of the angle pairs cannot be easily deduced from the model parameters unlike t…
▽ More
This paper studies circular correlations for the bivariate von Mises sine and cosine distributions. These are two simple and appealing models for bivariate angular data with five parameters each that have interpretations comparable to those in the ordinary bivariate normal model. However, the variability and association of the angle pairs cannot be easily deduced from the model parameters unlike the bivariate normal. Thus to compute such summary measures, tools from circular statistics are needed. We derive analytic expressions and study the properties of the Jammalamadaka-Sarma and Fisher-Lee circular correlation coefficients for the von Mises sine and cosine models. Likelihood-based inference of these coefficients from sample data is then presented. The correlation coefficients are illustrated with numerical and visual examples, and the maximum likelihood estimators are assessed on simulated and real data, with comparisons to their non-parametric counterparts. Implementations of these computations for practical use are provided in our R package BAMBI.
△ Less
Submitted 26 May, 2020; v1 submitted 23 April, 2018;
originally announced April 2018.
-
BAMBI: An R package for Fitting Bivariate Angular Mixture Models
Authors:
Saptarshi Chakraborty,
Samuel W. K. Wong
Abstract:
Statistical analyses of directional or angular data have applications in a variety of fields, such as geology, meteorology and bioinformatics. There is substantial literature on descriptive and inferential techniques for univariate angular data, with the bivariate (or more generally, multivariate) cases receiving more attention in recent years. More specifically, the bivariate wrapped normal, von…
▽ More
Statistical analyses of directional or angular data have applications in a variety of fields, such as geology, meteorology and bioinformatics. There is substantial literature on descriptive and inferential techniques for univariate angular data, with the bivariate (or more generally, multivariate) cases receiving more attention in recent years. More specifically, the bivariate wrapped normal, von Mises sine and von Mises cosine distributions, and mixtures thereof, have been proposed for practical use. However, there is a lack of software implementing these distributions and the associated inferential techniques. In this article, we introduce BAMBI, an R package for analyzing bivariate (and univariate) angular data. We implement random data generation, density evaluation, and computation of theoretical summary measures (variances and correlation coefficients) for the three aforementioned bivariate angular distributions, as well as two univariate angular distributions: the univariate wrapped normal and the univariate von Mises distribution. The major contribution of BAMBI to statistical computing is in providing Bayesian methods for modeling angular data using finite mixtures of these distributions. We also provide functions for visual and numerical diagnostics and Bayesian inference for the fitted models. In this article, we first provide a brief review of the distributions and techniques used in BAMBI, then describe the capabilities of the package, and finally conclude with demonstrations of mixture model fitting using BAMBI on the two real datasets included in the package, one univariate and one bivariate.
△ Less
Submitted 17 March, 2019; v1 submitted 25 August, 2017;
originally announced August 2017.
-
Sequential Decision Model for Inference and Prediction on Non-Uniform Hypergraphs with Application to Knot Matching from Computational Forestry
Authors:
Seong-Hwan Jun,
Samuel W. K. Wong,
James V. Zidek,
Alexandre Bouchard-Côté
Abstract:
In this paper, we consider the knot matching problem arising in computational forestry. The knot matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation…
▽ More
In this paper, we consider the knot matching problem arising in computational forestry. The knot matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation along with a sequential Monte Carlo sampler on graph matching that can be utilized for rapid sampling of graph matching. We demonstrate the effectiveness of our methods on 30 manually annotated boards and present findings from various simulation studies to provide further evidence supporting the efficacy of our methods.
△ Less
Submitted 24 August, 2017;
originally announced August 2017.
-
The duration of load effect in lumber as stochastic degradation
Authors:
Samuel W. K. Wong,
James V. Zidek
Abstract:
This paper proposes a gamma process for modelling the damage that accumulates over time in the lumber used in structural engineering applications when stress is applied. The model separates the stochastic processes representing features internal to the piece of lumber on the one hand, from those representing external forces due to applied dead and live loads. The model applies those external force…
▽ More
This paper proposes a gamma process for modelling the damage that accumulates over time in the lumber used in structural engineering applications when stress is applied. The model separates the stochastic processes representing features internal to the piece of lumber on the one hand, from those representing external forces due to applied dead and live loads. The model applies those external forces through a time-varying population level function designed for time-varying loads. The application of this type of model, which is standard in reliability analysis, is novel in this context, which has been dominated by accumulated damage models (ADMs) over more than half a century. The proposed model is compared with one of the traditional ADMs. Our statistical results based on a Bayesian analysis of experimental data highlight the limitations of using accelerated testing data to assess long-term reliability, as seen in the wide posterior intervals. This suggests the need for more comprehensive testing in future applications, or to encode appropriate expert knowledge in the priors used for Bayesian analysis.
△ Less
Submitted 23 August, 2017;
originally announced August 2017.
-
Dimensional and statistical foundations for accumulated damage models
Authors:
Samuel W. K. Wong,
James V. Zidek
Abstract:
This paper develops a framework for creating damage accumulation models for engineered wood products by invoking the classical theory of non--dimensionalization. The result is a general class of such models. Both the US and Canadian damage accumulation models are revisited. It is shown how the former may be generalized within that framework while deficiencies are discovered in the latter and overc…
▽ More
This paper develops a framework for creating damage accumulation models for engineered wood products by invoking the classical theory of non--dimensionalization. The result is a general class of such models. Both the US and Canadian damage accumulation models are revisited. It is shown how the former may be generalized within that framework while deficiencies are discovered in the latter and overcome. Use of modern Bayesian statistical methods for estimating the parameters in these models is proposed along with an illustrative application of these methods to a ramp load dataset.
△ Less
Submitted 9 August, 2017;
originally announced August 2017.
-
Bayesian analysis of accumulated damage models in lumber reliability
Authors:
Chun-Hao Yang,
James V. Zidek,
Samuel W. K. Wong
Abstract:
Wood products that are subjected to sustained stress over a period of long duration may weaken, and this effect must be considered in models for the long-term reliability of lumber. The damage accumulation approach has been widely used for this purpose to set engineering standards. In this article, we revisit an accumulated damage model and propose a Bayesian framework for analysis. For parameter…
▽ More
Wood products that are subjected to sustained stress over a period of long duration may weaken, and this effect must be considered in models for the long-term reliability of lumber. The damage accumulation approach has been widely used for this purpose to set engineering standards. In this article, we revisit an accumulated damage model and propose a Bayesian framework for analysis. For parameter estimation and uncertainty quantification, we adopt approximation Bayesian computation (ABC) techniques to handle the complexities of the model. We demonstrate the effectiveness of our approach using both simulated and real data, and apply our fitted model to analyze long-term lumber reliability under a stochastic live loading scenario.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.