Search | arXiv e-print repository

arXiv:2506.22313 [pdf, ps, other]

Manifold-Constrained Gaussian Processes for Inference of Mixed-effects Ordinary Differential Equations with Application to Pharmacokinetics

Authors: Yuxuan Zhao, Samuel W. K. Wong

Abstract: Pharmacokinetic modeling using ordinary differential equations (ODEs) has an important role in dose optimization studies, where dosing must balance sustained therapeutic efficacy with the risk of adverse side effects. Such ODE models characterize drug plasma concentration over time and allow pharmacokinetic parameters to be inferred, such as drug absorption and elimination rates. For time-course s… ▽ More Pharmacokinetic modeling using ordinary differential equations (ODEs) has an important role in dose optimization studies, where dosing must balance sustained therapeutic efficacy with the risk of adverse side effects. Such ODE models characterize drug plasma concentration over time and allow pharmacokinetic parameters to be inferred, such as drug absorption and elimination rates. For time-course studies involving treatment groups with multiple subjects, mixed-effects ODE models are commonly used. However, existing methods tend to lack uncertainty quantification on a subject-level, for key measures such as peak or trough concentration and for making predictions of drug concentration. To address such limitations, we propose an extension of manifold-constrained Gaussian processes for inference of general mixed-effects ODE models within a Bayesian statistical framework. We evaluate our method on simulated examples, demonstrating its ability to provide fast and accurate inference for parameters and trajectories using nested optimization. To illustrate the practical efficacy of the proposed method, we provide a real data analysis of a pharmacokinetic model used for an HIV combination therapy study. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: 34 pages, 4 figures

arXiv:2506.09722 [pdf, ps, other]

Fully Bayesian Sequential Design for Mean Response Surface Prediction of Heteroscedastic Stochastic Simulations

Authors: Yuying Huang, Samuel W. K. Wong

Abstract: We present a fully Bayesian sequential strategy for predicting the mean response surface of heteroscedastic stochastic simulation functions. Leveraging dual Gaussian processes as the surrogate model and a criterion based on empirical expected integrated mean-square prediction error, our approach sequentially selects informative design points while fully accounting for parameter uncertainty. Sequen… ▽ More We present a fully Bayesian sequential strategy for predicting the mean response surface of heteroscedastic stochastic simulation functions. Leveraging dual Gaussian processes as the surrogate model and a criterion based on empirical expected integrated mean-square prediction error, our approach sequentially selects informative design points while fully accounting for parameter uncertainty. Sequential importance sampling is employed to efficiently update the posterior distribution of the parameters. Our strategy is tailored for expensive simulation functions, where achieving robust predictive accuracy under a limited budget is critical. We illustrate its potential advantages compared to existing approaches through synthetic examples. We then implement the proposed strategy on a real motivating application in seismic design of wood-frame podium buildings. △ Less

Submitted 11 June, 2025; originally announced June 2025.

Comments: 37 pages, 8 figures

arXiv:2406.15170 [pdf, other]

Inference for Delay Differential Equations Using Manifold-Constrained Gaussian Processes

Authors: Yuxuan Zhao, Samuel W. K. Wong

Abstract: Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system… ▽ More Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system. We propose an extension of manifold-constrained Gaussian processes to conduct parameter inference for DDEs, whereas the time delay parameters have posed a challenge for existing methods that bypass numerical solvers. Our method uses a Bayesian framework to impose a Gaussian process model over the system trajectory, conditioned on the manifold constraint that satisfies the DDEs. For efficient computation, a linear interpolation scheme is developed to approximate the values of the time-delayed system outputs, along with corresponding theoretical error bounds on the approximated derivatives. Two simulation examples, based on Hutchinson's equation and the lac operon system, together with a real-world application using Ontario COVID-19 data, are used to illustrate the efficacy of our method. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 42 pages, 8 figures

arXiv:2401.04723 [pdf, other]

Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

Authors: Shiyu He, Samuel W. K. Wong

Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone "in situ" and "satellite" models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures

arXiv:2312.13044 [pdf, other]

Particle Gibbs for Likelihood-Free Inference of State Space Models with Application to Stochastic Volatility

Authors: Zhaoran Hou, Samuel W. K. Wong

Abstract: State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approxi… ▽ More State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approximate Bayesian computation (ABC) setting. The proposed sampler incorporates a conditional auxiliary particle filter, which can help mitigate the weight degeneracy often encountered in ABC. To illustrate the methodology, we focus on a classic stochastic volatility model (SVM) used in finance and econometrics for analyzing and interpreting volatility. Simulation studies demonstrate the accuracy of our sampler for SVM parameter inference, compared to existing particle Gibbs samplers based on the conditional bootstrap filter. As a real data application, we apply the proposed sampler for fitting an SVM to S&P 500 Index time-series data during the 2008 financial crisis. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 23 pages

arXiv:2311.03497 [pdf, other]

Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and Sector

Authors: Shiyu He, Trang Bui, Yuying Huang, Wenling Zhang, Jie Jian, Samuel W. K. Wong, Tony S. Wirjanto

Abstract: To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that r… ▽ More To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that rising Fall temperature anomalies have a notable adverse impact on Canadian economic growth. Province-wide, Saskatchewan and Manitoba are anticipated to experience the most substantial declines, whereas British Columbia and the Maritime provinces will be less impacted. Industry-wide, Mining is projected to see the greatest benefits, while Agriculture and Manufacturing are projected to have the most significant downturns. The disparities of climate change effects between provinces and industries highlight the need for governments to tailor their policies accordingly, and offer targeted assistance to regions and industries that are particularly vulnerable in the face of climate change. Targeted approaches to climate change mitigation are likely to be more effective than one-size-fits-all policies for the whole economy. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 25 pages, 7 figures

arXiv:2304.02127 [pdf, other]

A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

Authors: Mingwei Xu, Samuel W. K. Wong, Peijun Sang

Abstract: Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met… ▽ More Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these methods can be hindered by estimation error, especially when only sparse time-course observations are available. We present a Bayesian collocation framework that operates on the integrated form of the ODEs and also avoids the expensive use of numerical solvers. Our methodology has the capability to handle general nonlinear ODE systems. We demonstrate the accuracy of the proposed method through simulation studies, where the estimated parameters and recovered system trajectories are compared with other recent methods. A real data example is also provided. △ Less

Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2301.12302 [pdf, other]

A Kriging Metamodel with Adaptive Sampling for Seismic Evaluation of Podium Buildings

Authors: Yuying Huang, Zhiyong Chen, Samuel W. K. Wong

Abstract: In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel… ▽ More In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel with tailored adaptive sampling methods to achieve this goal in a computationally efficient manner. The input variables we consider are the normalized stiffness ratio and the normalized mass ratio of the podium building. We took a six-story wood frame built upon a one-story concrete podium as a case study for our methodology, where our results indicate that the two-step analysis procedure may be used with high confidence if its normalized stiffness ratio is at least 38 and its normalized mass ratio is between 0.5 and 1.5. △ Less

Submitted 28 January, 2023; originally announced January 2023.

Comments: 14 pages, 2 figures

arXiv:2212.10653 [pdf, other]

Estimating and Assessing Differential Equation Models with Time-Course Data

Authors: Samuel W. K. Wong, Shihao Yang, S. C. Kou

Abstract: Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hi… ▽ More Ordinary differential equation (ODE) models are widely used to describe chemical or biological processes. This article considers the estimation and assessment of such models on the basis of time-course data. Due to experimental limitations, time-course data are often noisy and some components of the system may not be observed. Furthermore, the computational demands of numerical integration have hindered the widespread adoption of time-course analysis using ODEs. To address these challenges, we explore the efficacy of the recently developed MAGI (MAnifold-constrained Gaussian process Inference) method for ODE inference. First, via a range of examples we show that MAGI is capable of inferring the parameters and system trajectories, including unobserved components, with appropriate uncertainty quantification. Second, we illustrate how MAGI can be used to assess and select different ODE models with time-course data based on MAGI's efficient computation of model predictions. Overall, we believe MAGI is a useful method for the analysis of time-course data in the context of ODE models, which bypasses the need for any numerical integration. △ Less

Submitted 13 February, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 26 pages, 8 figures, with code supplement

arXiv:2210.14216 [pdf, other]

Estimating Boltzmann Averages for Protein Structural Quantities Using Sequential Monte Carlo

Authors: Zhaoran Hou, Samuel W. K. Wong

Abstract: Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multip… ▽ More Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multiple descendants for each particle, followed by resampling to maintain the desired number of particles. Simulation studies demonstrate the efficacy of the method for tackling the protein sampling problem. As a real data example, we use our method to estimate the number of atomic contacts for a key segment of the SARS-CoV-2 viral spike protein. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: 20 pages

arXiv:2210.13323 [pdf, other]

A Comparative Study of Compartmental Models for COVID-19 Transmission in Ontario, Canada

Authors: Yuxuan Zhao, Samuel W. K. Wong

Abstract: The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic tran… ▽ More The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic transmission mechanisms and are easy to understand. Their performance in real-world settings, however, needs to be more thoroughly assessed. In this comparative study, we examine five compartmental models -- four existing ones and an extended model that we propose -- and analyze their ability to describe COVID-19 transmission in Ontario from January 2022 to June 2022. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 26 pages, 8 figures

arXiv:2203.06066 [pdf, other]

MAGI: A Package for Inference of Dynamic Systems from Noisy and Sparse Data via Manifold-constrained Gaussian Processes

Authors: Samuel W. K. Wong, Shihao Yang, S. C. Kou

Abstract: This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobser… ▽ More This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobserved. MAGI solves this inference problem with the help of manifold-constrained Gaussian processes within a Bayesian statistical framework, whereas unobserved components have posed a significant challenge for existing software. We use several realistic examples to illustrate the functionality of MAGI. The user may choose to use the package in any of the R, MATLAB, and Python environments. △ Less

Submitted 16 October, 2023; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: 47 pages, 10 figures

arXiv:2201.07775 [pdf, other]

Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant

Authors: Samuel W. K. Wong

Abstract: Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are pre… ▽ More Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are presented as static snapshots that may not adequately depict this flexibility, and furthermore these cannot keep pace with new mutations and variants. In this paper we present a sequential Monte Carlo method for broadly sampling the 3-D conformational space of protein structure, according to the Boltzmann distribution of a given energy function. Our approach is distinct from previous sampling methods that focus on finding the lowest-energy conformation for predicting a single stable structure. We exemplify our method on the SARS-CoV-2 omicron variant as an application of timely interest. Our results identify sequence positions 495-508 as a key region where omicron mutations have the most impact on the space of possible conformations, which coincides with the findings of other preliminary studies on the binding properties of the omicron variant. △ Less

Submitted 4 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: 20 pages, 4 figures

arXiv:2201.03464 [pdf, other]

Knots and their effect on the tensile strength of lumber: a case study

Authors: Shuxian Fan, Samuel W. K. Wong, James V. Zidek

Abstract: When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that gove… ▽ More When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that govern knots are informed by subjective judgment to some extent, particularly the spatial interaction of knots and their relationship with lumber strength. This case study reports the results of an experiment that investigated and modelled the strength-reducing effects of knots on a sample of Douglas Fir lumber. Experimental data were obtained by taking scans of lumber surfaces and applying tensile strength testing. The modelling approach presented incorporates all relevant knot information in a Bayesian framework, thereby contributing a more refined way of managing the quality of manufactured lumber. △ Less

Submitted 14 February, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

Comments: 20 pages, 4 figures

arXiv:2110.11896 [pdf, other]

Multimodel Bayesian Analysis of Load Duration Effects in Lumber Reliability

Authors: Yunfeng Yang, Martin Lysy, Samuel W. K. Wong

Abstract: This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also… ▽ More This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also described. We propose Bayesian model-averaging (BMA) as a method for combining the reliability estimates of individual models under a given load profile that coherently accounts for statistical uncertainty in the choice of model and parameter values. The method is applied to the analysis of a Hemlock experimental dataset, where the BMA results are illustrated via estimated reliability indices together with 95% interval bands. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 15 pages, 2 figures

arXiv:2105.08835 [pdf, ps, other]

Conformational variability of loops in the SARS-CoV-2 spike protein

Authors: Samuel W. K. Wong, Zongjun Liu

Abstract: The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had… ▽ More The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had essentially one stable conformation, 17 of 44 loop regions were observed to be structurally variable with multiple substantively distinct conformations based on a cluster analysis. Loop modeling methods were then applied to the S protein loop targets, and the prediction accuracies discussed in relation to the characteristics of the conformational clusters identified. Loops with multiple conformations were found to be challenging to model based on a single structural template. △ Less

Submitted 13 October, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: 24 pages

arXiv:2104.10878 [pdf, other]

doi 10.3934/math.2022376

Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

Authors: Geoffrey McGregor, Jennifer Tippett, Andy T. S. Wan, Mengxiao Wang, Samuel W. K. Wong

Abstract: We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absen… ▽ More We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absence of COVID-19 variants and vaccinations during this period, we examine the regionalized basic reproduction number, modelled prevalence, relative reduction in contact due to physical distancing, and proportion of anticipated cases that have been tested and reported. We observe significant differences between the regional and provincial-wide models and demonstrate the hierarchical regional model can better estimate regional prevalence, especially in rural regions. These results indicate that it can be useful to apply similar regional models to other parts of Canada or other countries. △ Less

Submitted 13 November, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Comments: 35 pages, 16 figures

Journal ref: AIMS Mathematics, 2022, 7(4): 6743-6778

arXiv:2101.02304 [pdf, other]

Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

Authors: Shiyu He, Samuel W. K. Wong

Abstract: As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences i… ▽ More As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates. △ Less

Submitted 30 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

Comments: 21 pages, 5 figures

arXiv:2011.04844 [pdf, other]

Ellipse Detection and Localization with Applications to Knots in Sawn Lumber Images

Authors: Shenyi Pan, Shuxian Fan, Samuel W. K. Wong, James V. Zidek, Helge Rhodin

Abstract: While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tai… ▽ More While general object detection has seen tremendous progress, localization of elliptical objects has received little attention in the literature. Our motivating application is the detection of knots in sawn timber images, which is an important problem since the number and types of knots are visual characteristics that adversely affect the quality of sawn timber. We demonstrate how models can be tailored to the elliptical shape and thereby improve on general purpose detectors; more generally, elliptical defects are common in industrial production, such as enclosed air bubbles when casting glass or plastic. In this paper, we adapt the Faster R-CNN with its Region Proposal Network (RPN) to model elliptical objects with a Gaussian function, and extend the existing Gaussian Proposal Network (GPN) architecture by adding the region-of-interest pooling and regression branches, as well as using the Wasserstein distance as the loss function to predict the precise locations of elliptical objects. Our proposed method has promising results on the lumber knot dataset: knots are detected with an average intersection over union of 73.05%, compared to 63.63% for general purpose detectors. Specific to the lumber application, we also propose an algorithm to correct any misalignment in the raw timber images during scanning, and contribute the first open-source lumber knot dataset by labeling the elliptical knots in the preprocessed images. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted at WACV 2021

arXiv:2009.07444 [pdf, other]

doi 10.1073/pnas.2020397118

Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

Authors: Shihao Yang, Samuel W. K. Wong, S. C. Kou

Abstract: Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that deri… ▽ More Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that derivatives of the Gaussian process must satisfy the ODE system. By doing so, we completely bypass the need for numerical integration and achieve substantial savings in computational time. MAGI is also suitable for inference with unobserved system components, which often occur in real experiments. MAGI is distinct from existing approaches as we provide a principled statistical construction under a Bayesian framework, which incorporates the ODE system through the manifold constraint. We demonstrate the accuracy and speed of MAGI using realistic examples based on physical experiments. △ Less

Submitted 21 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

arXiv:2005.07550 [pdf, other]

doi 10.6339/JDS.202007_18(3).0017

Assessing the impacts of mutations to the structure of COVID-19 spike protein via sequential Monte Carlo

Authors: Samuel W. K. Wong

Abstract: Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a d… ▽ More Proteins play a key role in facilitating the infectiousness of the 2019 novel coronavirus. A specific spike protein enables this virus to bind to human cells, and a thorough understanding of its 3-dimensional structure is therefore critical for developing effective therapeutic interventions. However, its structure may continue to evolve over time as a result of mutations. In this paper, we use a data science perspective to study the potential structural impacts due to ongoing mutations in its amino acid sequence. To do so, we identify a key segment of the protein and apply a sequential Monte Carlo sampling method to detect possible changes to the space of low-energy conformations for different amino acid sequences. Such computational approaches can further our understanding of this protein structure and complement laboratory efforts. △ Less

Submitted 11 June, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: 15 pages, 4 figures

Journal ref: Journal of Data Science, 2020, 18(3): 511-525

arXiv:2002.03537 [pdf, other]

Calibrating wood products for load duration and rate: A statistical look at three damage models

Authors: Samuel W. K. Wong

Abstract: Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and… ▽ More Lumber and wood-based products are versatile construction materials that are susceptible to weakening as a result of applied stresses. To assess the effects of load duration and rate, experiments have been carried out by applying preset load profiles to sample specimens. This paper studies these effects via a damage modeling approach, by considering three models in the literature: the Gerhards and Foschi accumulated damage models, and a degradation model based on the gamma process. We present a statistical framework for fitting these models to failure time data generated by a combination of ramp and constant load settings, and show how estimation uncertainty can be quantified. The models and methods are illustrated and compared via a novel analysis of a Hemlock lumber dataset. Practical usage of the fitted damage models is demonstrated with an application to long-term reliability prediction under stochastic future loadings. △ Less

Submitted 9 February, 2020; originally announced February 2020.

Comments: 17 pages, 5 figures

arXiv:1910.02114 [pdf, other]

A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

Authors: Katherine C. Kempfert, Yishi Wang, Cuixian Chen, Samuel W. K. Wong

Abstract: Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analy… ▽ More Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction. △ Less

Submitted 4 October, 2019; originally announced October 2019.

arXiv:1809.05075 [pdf, other]

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

Authors: Katherine C. Kempfert, Samuel W. K. Wong

Abstract: For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the growing field of music information retrieval (MIR) has offered quantitative analyses to complement traditional qualitative analyses of these composers. In this MIR study, we classify the composer of Haydn and Mozart string quartets based on the content of their score… ▽ More For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the growing field of music information retrieval (MIR) has offered quantitative analyses to complement traditional qualitative analyses of these composers. In this MIR study, we classify the composer of Haydn and Mozart string quartets based on the content of their scores. Our contribution is an interpretable statistical and machine learning approach that provides high classification accuracies and musical relevance. We develop novel global features that are automatically computed from symbolic data and informed by musicological Haydn-Mozart comparative studies, particularly relating to the sonata form. Several of these proposed features are found to be important for distinguishing between Haydn and Mozart string quartets. Our Bayesian logistic regression model attains leave-one-out classification accuracies over 84%, higher than prior works and providing interpretations that could aid in assessing musicological claims. Overall, our work can help expand the longstanding dialogue surrounding Haydn and Mozart and exemplify the benefit of interpretable machine learning in MIR, with potential applications to music generation and classification of other classical composers. △ Less

Submitted 29 July, 2020; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: 30 pages with 11 pages supplement

arXiv:1804.08553 [pdf, other]

On the circular correlation coefficients for bivariate von Mises distributions on a torus

Authors: Saptarshi Chakraborty, Samuel W. K. Wong

Abstract: This paper studies circular correlations for the bivariate von Mises sine and cosine distributions. These are two simple and appealing models for bivariate angular data with five parameters each that have interpretations comparable to those in the ordinary bivariate normal model. However, the variability and association of the angle pairs cannot be easily deduced from the model parameters unlike t… ▽ More This paper studies circular correlations for the bivariate von Mises sine and cosine distributions. These are two simple and appealing models for bivariate angular data with five parameters each that have interpretations comparable to those in the ordinary bivariate normal model. However, the variability and association of the angle pairs cannot be easily deduced from the model parameters unlike the bivariate normal. Thus to compute such summary measures, tools from circular statistics are needed. We derive analytic expressions and study the properties of the Jammalamadaka-Sarma and Fisher-Lee circular correlation coefficients for the von Mises sine and cosine models. Likelihood-based inference of these coefficients from sample data is then presented. The correlation coefficients are illustrated with numerical and visual examples, and the maximum likelihood estimators are assessed on simulated and real data, with comparisons to their non-parametric counterparts. Implementations of these computations for practical use are provided in our R package BAMBI. △ Less

Submitted 26 May, 2020; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 29 pages, 3 figures, 5 tables

arXiv:1708.07804 [pdf, other]

BAMBI: An R package for Fitting Bivariate Angular Mixture Models

Authors: Saptarshi Chakraborty, Samuel W. K. Wong

Abstract: Statistical analyses of directional or angular data have applications in a variety of fields, such as geology, meteorology and bioinformatics. There is substantial literature on descriptive and inferential techniques for univariate angular data, with the bivariate (or more generally, multivariate) cases receiving more attention in recent years. More specifically, the bivariate wrapped normal, von… ▽ More Statistical analyses of directional or angular data have applications in a variety of fields, such as geology, meteorology and bioinformatics. There is substantial literature on descriptive and inferential techniques for univariate angular data, with the bivariate (or more generally, multivariate) cases receiving more attention in recent years. More specifically, the bivariate wrapped normal, von Mises sine and von Mises cosine distributions, and mixtures thereof, have been proposed for practical use. However, there is a lack of software implementing these distributions and the associated inferential techniques. In this article, we introduce BAMBI, an R package for analyzing bivariate (and univariate) angular data. We implement random data generation, density evaluation, and computation of theoretical summary measures (variances and correlation coefficients) for the three aforementioned bivariate angular distributions, as well as two univariate angular distributions: the univariate wrapped normal and the univariate von Mises distribution. The major contribution of BAMBI to statistical computing is in providing Bayesian methods for modeling angular data using finite mixtures of these distributions. We also provide functions for visual and numerical diagnostics and Bayesian inference for the fitted models. In this article, we first provide a brief review of the distributions and techniques used in BAMBI, then describe the capabilities of the package, and finally conclude with demonstrations of mixture model fitting using BAMBI on the two real datasets included in the package, one univariate and one bivariate. △ Less

Submitted 17 March, 2019; v1 submitted 25 August, 2017; originally announced August 2017.

Comments: 63 pages, 25 figures

arXiv:1708.07592 [pdf, other]

Sequential Decision Model for Inference and Prediction on Non-Uniform Hypergraphs with Application to Knot Matching from Computational Forestry

Authors: Seong-Hwan Jun, Samuel W. K. Wong, James V. Zidek, Alexandre Bouchard-Côté

Abstract: In this paper, we consider the knot matching problem arising in computational forestry. The knot matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation… ▽ More In this paper, we consider the knot matching problem arising in computational forestry. The knot matching problem is an important problem that needs to be solved to advance the state of the art in automatic strength prediction of lumber. We show that this problem can be formulated as a quadripartite matching problem and develop a sequential decision model that admits efficient parameter estimation along with a sequential Monte Carlo sampler on graph matching that can be utilized for rapid sampling of graph matching. We demonstrate the effectiveness of our methods on 30 manually annotated boards and present findings from various simulation studies to provide further evidence supporting the efficacy of our methods. △ Less

Submitted 24 August, 2017; originally announced August 2017.

Comments: 32 pages, 14 figures, submitted to Annals of Applied Statistics

arXiv:1708.07213 [pdf, ps, other]

The duration of load effect in lumber as stochastic degradation

Authors: Samuel W. K. Wong, James V. Zidek

Abstract: This paper proposes a gamma process for modelling the damage that accumulates over time in the lumber used in structural engineering applications when stress is applied. The model separates the stochastic processes representing features internal to the piece of lumber on the one hand, from those representing external forces due to applied dead and live loads. The model applies those external force… ▽ More This paper proposes a gamma process for modelling the damage that accumulates over time in the lumber used in structural engineering applications when stress is applied. The model separates the stochastic processes representing features internal to the piece of lumber on the one hand, from those representing external forces due to applied dead and live loads. The model applies those external forces through a time-varying population level function designed for time-varying loads. The application of this type of model, which is standard in reliability analysis, is novel in this context, which has been dominated by accumulated damage models (ADMs) over more than half a century. The proposed model is compared with one of the traditional ADMs. Our statistical results based on a Bayesian analysis of experimental data highlight the limitations of using accelerated testing data to assess long-term reliability, as seen in the wide posterior intervals. This suggests the need for more comprehensive testing in future applications, or to encode appropriate expert knowledge in the priors used for Bayesian analysis. △ Less

Submitted 23 August, 2017; originally announced August 2017.

arXiv:1708.03018 [pdf, other]

Dimensional and statistical foundations for accumulated damage models

Authors: Samuel W. K. Wong, James V. Zidek

Abstract: This paper develops a framework for creating damage accumulation models for engineered wood products by invoking the classical theory of non--dimensionalization. The result is a general class of such models. Both the US and Canadian damage accumulation models are revisited. It is shown how the former may be generalized within that framework while deficiencies are discovered in the latter and overc… ▽ More This paper develops a framework for creating damage accumulation models for engineered wood products by invoking the classical theory of non--dimensionalization. The result is a general class of such models. Both the US and Canadian damage accumulation models are revisited. It is shown how the former may be generalized within that framework while deficiencies are discovered in the latter and overcome. Use of modern Bayesian statistical methods for estimating the parameters in these models is proposed along with an illustrative application of these methods to a ramp load dataset. △ Less

Submitted 9 August, 2017; originally announced August 2017.

arXiv:1706.04643 [pdf, other]

Bayesian analysis of accumulated damage models in lumber reliability

Authors: Chun-Hao Yang, James V. Zidek, Samuel W. K. Wong

Abstract: Wood products that are subjected to sustained stress over a period of long duration may weaken, and this effect must be considered in models for the long-term reliability of lumber. The damage accumulation approach has been widely used for this purpose to set engineering standards. In this article, we revisit an accumulated damage model and propose a Bayesian framework for analysis. For parameter… ▽ More Wood products that are subjected to sustained stress over a period of long duration may weaken, and this effect must be considered in models for the long-term reliability of lumber. The damage accumulation approach has been widely used for this purpose to set engineering standards. In this article, we revisit an accumulated damage model and propose a Bayesian framework for analysis. For parameter estimation and uncertainty quantification, we adopt approximation Bayesian computation (ABC) techniques to handle the complexities of the model. We demonstrate the effectiveness of our approach using both simulated and real data, and apply our fitted model to analyze long-term lumber reliability under a stochastic live loading scenario. △ Less

Submitted 14 June, 2017; originally announced June 2017.

Showing 1–30 of 30 results for author: Wong, S W K