Skip to main content

Showing 1–50 of 99 results for author: Wong, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.22313  [pdf, ps, other

    stat.ME

    Manifold-Constrained Gaussian Processes for Inference of Mixed-effects Ordinary Differential Equations with Application to Pharmacokinetics

    Authors: Yuxuan Zhao, Samuel W. K. Wong

    Abstract: Pharmacokinetic modeling using ordinary differential equations (ODEs) has an important role in dose optimization studies, where dosing must balance sustained therapeutic efficacy with the risk of adverse side effects. Such ODE models characterize drug plasma concentration over time and allow pharmacokinetic parameters to be inferred, such as drug absorption and elimination rates. For time-course s… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 34 pages, 4 figures

  2. arXiv:2506.20048  [pdf, ps, other

    stat.ML cs.LG

    A Principled Path to Fitted Distributional Evaluation

    Authors: Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond Ka Wai Wong

    Abstract: In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributi… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  3. arXiv:2506.09722  [pdf, ps, other

    stat.ME

    Fully Bayesian Sequential Design for Mean Response Surface Prediction of Heteroscedastic Stochastic Simulations

    Authors: Yuying Huang, Samuel W. K. Wong

    Abstract: We present a fully Bayesian sequential strategy for predicting the mean response surface of heteroscedastic stochastic simulation functions. Leveraging dual Gaussian processes as the surrogate model and a criterion based on empirical expected integrated mean-square prediction error, our approach sequentially selects informative design points while fully accounting for parameter uncertainty. Sequen… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 37 pages, 8 figures

  4. arXiv:2504.13467  [pdf, ps, other

    stat.ME

    Efficient Estimation under Multiple Missing Patterns via Balancing Weights

    Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

    Abstract: As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and conventional methods cannot be applied. Pattern graphs have recently been proposed as a tool to systematically relate various observed patterns in the sample. We… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.08873

  5. arXiv:2503.00002  [pdf, other

    stat.ME stat.AP stat.CO

    Failure of Optimal Design Theory? A Case Study in Toxicology Using Sequential Robust Optimal Design Framework

    Authors: Elvis Han Cui, Michael Collins, Jessica Munson, Weng Kee Wong

    Abstract: This paper presents a quasi-sequential optimal design framework for toxicology experiments, specifically applied to sea urchin embryos. The authors propose a novel approach combining robust optimal design with adaptive, stage-based testing to improve efficiency in toxicological studies, particularly where traditional uniform designs fall short. The methodology uses statistical models to refine dos… ▽ More

    Submitted 10 February, 2025; originally announced March 2025.

  6. arXiv:2411.05797  [pdf, other

    cs.NE stat.CO

    Metaheuristics is All You Need

    Authors: Eliuvish Cuicizion, Haowen Xu, Weng Kee Wong

    Abstract: Optimization plays an important role in tackling public health problems. Animal instincts can be used effectively to solve complex public health management issues by providing optimal or approximately optimal solutions to complicated optimization problems common in public health. BAT algorithm is an exemplary member of a class of nature-inspired metaheuristic optimization algorithms and designed t… ▽ More

    Submitted 21 March, 2025; v1 submitted 25 October, 2024; originally announced November 2024.

    Comments: 25 pages, many figures

  7. arXiv:2410.11482  [pdf, other

    stat.ME stat.CO

    Scalable likelihood-based estimation and variable selection for the Cox model with incomplete covariates

    Authors: Ngok Sang Kwok, Kin Yau Wong

    Abstract: Regression analysis with missing data is a long-standing and challenging problem, particularly when there are many missing variables with arbitrary missing patterns. Likelihood-based methods, although theoretically appealing, are often computationally inefficient or even infeasible when dealing with a large number of missing variables. In this paper, we consider the Cox regression model with incom… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 15 pages, 2 figures, 7 tables

  8. arXiv:2409.03572  [pdf, other

    stat.ME

    Extrinsic Principal Component Analysis

    Authors: Ka Chun Wong, Vic Patrangenaru, Robert L. Paige, Mihaela Pricop Jeckstadt

    Abstract: One develops a fast computational methodology for principal component analysis on manifolds. Instead of estimating intrinsic principal components on an object space with a Riemannian structure, one embeds the object space in a numerical space, and the resulting chord distance is used. This method helps us analyzing high, theoretically even infinite dimensional data, from a new perspective. We defi… ▽ More

    Submitted 3 October, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  9. arXiv:2406.15170  [pdf, other

    stat.ME

    Inference for Delay Differential Equations Using Manifold-Constrained Gaussian Processes

    Authors: Yuxuan Zhao, Samuel W. K. Wong

    Abstract: Dynamic systems described by differential equations often involve feedback among system components. When there are time delays for components to sense and respond to feedback, delay differential equation (DDE) models are commonly used. This paper considers the problem of inferring unknown system parameters, including the time delays, from noisy and sparse experimental data observed from the system… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 42 pages, 8 figures

  10. arXiv:2405.12386  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

    Authors: Sisi Shao, Junhyung Park, Weng Kee Wong

    Abstract: General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  11. arXiv:2402.08873  [pdf, ps, other

    stat.ME

    Balancing Weights for Non-monotone Missing Data

    Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

    Abstract: Balancing weights have been widely applied to single or monotone missingness due to empirical advantages over likelihood-based methods and inverse probability weighting approaches. This paper considers non-monotone missing data under the complete-case missing variable condition (CCMV), a case of missing not at random (MNAR). Using relationships between each missing pattern and the complete-case su… ▽ More

    Submitted 12 December, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  12. arXiv:2402.06058  [pdf, other

    stat.ME

    Mathematical programming tools for randomization purposes in small two-arm clinical trials: A case study with real data

    Authors: Alan R. Vazquez, Weng Kee Wong

    Abstract: Modern randomization methods in clinical trials are invariably adaptive, meaning that the assignment of the next subject to a treatment group uses the accumulated information in the trial. Some of the recent adaptive randomization methods use mathematical programming to construct attractive clinical trials that balance the group features, such as their sizes and covariate distributions of their su… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 36 pages, 12 figures

  13. arXiv:2402.01900  [pdf, other

    stat.ML cs.LG

    Distributional Off-policy Evaluation with Bellman Residual Minimization

    Authors: Sungee Hong, Zhengling Qi, Raymond K. W. Wong

    Abstract: We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable ex… ▽ More

    Submitted 12 March, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

  14. arXiv:2401.10010  [pdf, ps, other

    stat.ME

    A global kernel estimator for partially linear varying coefficient additive hazards models

    Authors: Hoi Min Ng, Kin Yau Wong

    Abstract: In biomedical studies, we are often interested in the association between different types of covariates and the times to disease events. Because the relationship between the covariates and event times is often complex, standard survival models that assume a linear covariate effect are inadequate. A flexible class of models for capturing complex interaction effects among types of covariates is the… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 27 pages

    MSC Class: 62N02

  15. arXiv:2401.04723  [pdf, other

    stat.ME

    Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 23 pages, 7 figures

  16. arXiv:2312.13044  [pdf, other

    stat.ME stat.CO

    Particle Gibbs for Likelihood-Free Inference of State Space Models with Application to Stochastic Volatility

    Authors: Zhaoran Hou, Samuel W. K. Wong

    Abstract: State space models (SSMs) are widely used to describe dynamic systems. However, when the likelihood of the observations is intractable, parameter inference for SSMs cannot be easily carried out using standard Markov chain Monte Carlo or sequential Monte Carlo methods. In this paper, we propose a particle Gibbs sampler as a general strategy to handle SSMs with intractable likelihoods in the approxi… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 23 pages

  17. arXiv:2311.03497  [pdf, other

    stat.AP

    Understanding the Impact of Seasonal Climate Change on Canada's Economy by Region and Sector

    Authors: Shiyu He, Trang Bui, Yuying Huang, Wenling Zhang, Jie Jian, Samuel W. K. Wong, Tony S. Wirjanto

    Abstract: To assess the impact of climate change on the Canadian economy, we investigate and model the relationship between seasonal climate variables and economic growth across provinces and economic sectors. We further provide projections of climate change impacts up to the year 2050, taking into account the diverse climate change patterns and economic conditions across Canada. Our results indicate that r… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, 7 figures

  18. arXiv:2310.20537  [pdf, other

    stat.ME stat.ML

    Directed Cyclic Graph for Causal Discovery from Multivariate Functional Data

    Authors: Saptarshi Roy, Raymond K. W. Wong, Yang Ni

    Abstract: Discovering causal relationship using multivariate functional data has received a significant amount of attention very recently. In this article, we introduce a functional linear structural equation model for causal structure learning when the underlying graph involving the multivariate functions may have cycles. To enhance interpretability, our model involves a low-dimensional causal embedded spa… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 36 pages, 2 figures, 7 tables

  19. arXiv:2310.15070  [pdf, ps, other

    stat.ME

    Improving estimation efficiency of case-cohort study with interval-censored failure time data

    Authors: Qingning Zhou, Kin Yau Wong

    Abstract: The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with interval-censored failure time data, where the failure time is only known to fall within an interval instead of being exactly observed. A common approach to analyz… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 tables

  20. arXiv:2310.07801  [pdf, other

    cs.CV cs.AI stat.ME

    Trajectory-aware Principal Manifold Framework for Data Augmentation and Image Generation

    Authors: Elvis Han Cui, Bingbin Li, Yanan Li, Weng Kee Wong, Donghui Wang

    Abstract: Data augmentation for deep learning benefits model training, image transformation, medical imaging analysis and many other fields. Many existing methods generate new samples from a parametric distribution, like the Gaussian, with little attention to generate samples along the data manifold in either the input or feature space. In this paper, we verify that there are theoretical and practical advan… ▽ More

    Submitted 30 July, 2023; originally announced October 2023.

    Comments: 20 figures

  21. arXiv:2309.08039  [pdf, other

    stat.ME math.ST

    Flexible Functional Treatment Effect Estimation

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaoke Zhang, Kwun Chuen Gary Chan

    Abstract: We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight… ▽ More

    Submitted 12 November, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  22. arXiv:2306.16909  [pdf, other

    stat.ME

    A network-based regression approach for identifying subject-specific driver mutations

    Authors: Kin Yau Wong, Donglin Zeng, D. Y. Lin

    Abstract: In cancer genomics, it is of great importance to distinguish driver mutations, which contribute to cancer progression, from causally neutral passenger mutations. We propose a random-effect regression approach to estimate the effects of mutations on the expressions of genes in tumor samples, where the estimation is assisted by a prespecified gene network. The model allows the mutation effects to va… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 23 pages; 9 figures

  23. arXiv:2304.02127  [pdf, other

    stat.ME

    A Bayesian Collocation Integral Method for Parameter Estimation in Ordinary Differential Equations

    Authors: Mingwei Xu, Samuel W. K. Wong, Peijun Sang

    Abstract: Inferring the parameters of ordinary differential equations (ODEs) from noisy observations is an important problem in many scientific fields. Currently, most parameter estimation methods that bypass numerical integration tend to rely on basis functions or Gaussian processes to approximate the ODE solution and its derivatives. Due to the sensitivity of the ODE solution to its derivatives, these met… ▽ More

    Submitted 23 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  24. Bayesian Nonlinear Tensor Regression with Functional Fused Elastic Net Prior

    Authors: Shuoli Chen, Kejun He, Shiyuan He, Yang Ni, Raymond K. W. Wong

    Abstract: Tensor regression methods have been widely used to predict a scalar response from covariates in the form of a multiway array. In many applications, the regions of tensor covariates used for prediction are often spatially connected with unknown shapes and discontinuous jumps on the boundaries. Moreover, the relationship between the response and the tensor covariates can be nonlinear. In this articl… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Journal ref: Technometrics, 65:4, 524-536 (2023)

  25. arXiv:2301.12540  [pdf, other

    stat.ML cs.LG

    Implicit Regularization for Group Sparsity

    Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

    Abstract: We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

    Comments: accepted by ICLR 2023

  26. arXiv:2301.12302  [pdf, other

    stat.AP

    A Kriging Metamodel with Adaptive Sampling for Seismic Evaluation of Podium Buildings

    Authors: Yuying Huang, Zhiyong Chen, Samuel W. K. Wong

    Abstract: In this paper, nonlinear time-history dynamic analyses of selected earthquake ground motions are conducted on designated wood-frame podium buildings and the resulting inter-story drifts are analyzed. We aim to construct a reliable region where performance-based seismic design criteria are met, such that a two-step analysis procedure can be used with high confidence. We develop a kriging metamodel… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: 14 pages, 2 figures

  27. arXiv:2210.14216  [pdf, other

    stat.ME

    Estimating Boltzmann Averages for Protein Structural Quantities Using Sequential Monte Carlo

    Authors: Zhaoran Hou, Samuel W. K. Wong

    Abstract: Sequential Monte Carlo (SMC) methods are widely used to draw samples from intractable target distributions. Particle degeneracy can hinder the use of SMC when the target distribution is highly constrained or multimodal. As a motivating application, we consider the problem of sampling protein structures from the Boltzmann distribution. This paper proposes a general SMC method that propagates multip… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 20 pages

  28. arXiv:2210.13323  [pdf, other

    q-bio.PE stat.AP

    A Comparative Study of Compartmental Models for COVID-19 Transmission in Ontario, Canada

    Authors: Yuxuan Zhao, Samuel W. K. Wong

    Abstract: The number of confirmed COVID-19 cases reached over 1.3 million in Ontario, Canada by June 4, 2022. The continued spread of the virus underlying COVID-19 has been spurred by the emergence of variants since the initial outbreak in December, 2019. Much attention has thus been devoted to tracking and modelling the transmission of COVID-19. Compartmental models are commonly used to mimic epidemic tran… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 26 pages, 8 figures

  29. arXiv:2206.12891  [pdf, other

    stat.ME

    Hierarchical nuclear norm penalization for multi-view data

    Authors: Sangyoon Yi, Raymond K. W. Wong, Irina Gaynanova

    Abstract: The prevalence of data collected on the same set of samples from multiple sources (i.e., multi-view data) has prompted significant development of data integration methods based on low-rank matrix factorizations. These methods decompose signal matrices from each view into the sum of shared and individual structures, which are further used for dimension reduction, exploratory analyses, and quantifyi… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 39 pages, 10 figures, 3 tables

  30. arXiv:2203.12913  [pdf, other

    cs.AI stat.ML

    k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations

    Authors: Ka Wong, Praveen Paritosh

    Abstract: Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many natural language processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is und… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  31. arXiv:2203.06066  [pdf, other

    stat.CO

    MAGI: A Package for Inference of Dynamic Systems from Noisy and Sparse Data via Manifold-constrained Gaussian Processes

    Authors: Samuel W. K. Wong, Shihao Yang, S. C. Kou

    Abstract: This article presents the MAGI software package for the inference of dynamic systems. The focus of MAGI is on dynamics modeled by nonlinear ordinary differential equations with unknown parameters. While such models are widely used in science and engineering, the available experimental data for parameter estimation may be noisy and sparse. Furthermore, some system components may be entirely unobser… ▽ More

    Submitted 16 October, 2023; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: 47 pages, 10 figures

  32. arXiv:2201.07775  [pdf, other

    stat.AP q-bio.BM

    Monte Carlo sampling of flexible protein structures: an application to the SARS-CoV-2 omicron variant

    Authors: Samuel W. K. Wong

    Abstract: Proteins can exhibit dynamic structural flexibility as they carry out their functions, especially in binding regions that interact with other molecules. For the key SARS-CoV-2 spike protein that facilitates COVID-19 infection, studies have previously identified several such highly flexible regions with therapeutic importance. However, protein structures available from the Protein Data Bank are pre… ▽ More

    Submitted 4 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 20 pages, 4 figures

  33. arXiv:2201.03464  [pdf, other

    stat.AP

    Knots and their effect on the tensile strength of lumber: a case study

    Authors: Shuxian Fan, Samuel W. K. Wong, James V. Zidek

    Abstract: When assessing the strength of sawn lumber for use in engineering applications, the sizes and locations of knots are an important consideration. Knots are the most common visual characteristics of lumber, that result from the growth of tree branches. Large individual knots, as well as clusters of distinct knots, are known to have strength-reducing effects. However, industry grading rules that gove… ▽ More

    Submitted 14 February, 2023; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: 20 pages, 4 figures

  34. arXiv:2111.14623  [pdf, other

    cs.LG cs.CY stat.AP

    An Overview of Healthcare Data Analytics With Applications to the COVID-19 Pandemic

    Authors: Zhe Fei, Yevgen Ryeznik, Oleksandr Sverdlov, Chee Wei Tan, Weng Kee Wong

    Abstract: In the era of big data, standard analysis tools may be inadequate for making inference and there is a growing need for more efficient and innovative ways to collect, process, analyze and interpret the massive and complex data. We provide an overview of challenges in big data problems and describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general health… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Journal ref: IEEE TRANSACTIONS ON BIG DATA, 12 August 2021

  35. arXiv:2110.11896  [pdf, other

    stat.AP

    Multimodel Bayesian Analysis of Load Duration Effects in Lumber Reliability

    Authors: Yunfeng Yang, Martin Lysy, Samuel W. K. Wong

    Abstract: This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 15 pages, 2 figures

  36. Evaluating the Impact of State-Level Public Masking Mandates on New COVID-19 Cases and Deaths in the United States: A Demonstration of the Causal Roadmap

    Authors: Angus K. Wong, Laura B. Balzer

    Abstract: At a national-level, we sought to investigate the effect of public masking mandates on COVID-19 in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by September 1, 2020 versus if all states had delayed issuing such a mandate. To do so, we applied the Causal Roadmap, a formal f… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 34 total page (including supp materials)

    Journal ref: Epidemiology, December 8, 2021

  37. arXiv:2109.04640  [pdf, other

    cs.LG stat.ME

    Projected State-action Balancing Weights for Offline Reinforcement Learning

    Authors: Jiayi Wang, Zhengling Qi, Raymond K. W. Wong

    Abstract: Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and t… ▽ More

    Submitted 9 June, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

  38. arXiv:2108.05574  [pdf, other

    stat.ML cs.LG

    Implicit Sparse Regularization: The Impact of Depth and Early Stopping

    Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

    Abstract: In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon… ▽ More

    Submitted 26 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: 32 pages, accepted by NeurIPS 2021. arXiv admin note: text overlap with arXiv:1909.05122 by other authors

  39. arXiv:2106.07393  [pdf, other

    stat.AP cs.AI cs.SI

    Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

    Authors: Ka Wong, Praveen Paritosh, Lora Aroyo

    Abstract: We present a new approach to interpreting IRR that is empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. We call this approach the xRR framework. We opensource a replication dataset of 4 million human judgements of facial expressions and analyze it wi… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  40. arXiv:2106.05850  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Matrix Completion with Model-free Weighting

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaojun Mao, Kwun Chuen Gary Chan

    Abstract: In this paper, we propose a novel method for matrix completion under general non-uniform missing structures. By controlling an upper bound of a novel balancing error, we construct weights that can actively adjust for the non-uniformity in the empirical risk without explicitly modeling the observation probabilities, and can be computed efficiently via convex optimization. The recovered matrix based… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  41. arXiv:2105.14647  [pdf, ps, other

    stat.ME

    Orthogonal Subsampling for Big Data Linear Regression

    Authors: Lin Wang, Jake Elmstedt, Weng Kee Wong, Hongquan Xu

    Abstract: The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal subsampling (OSS) approach for big data with a focus on linear regression models. The approach is inspired by the fact that an orthogonal array of two levels provide… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

  42. arXiv:2105.08835  [pdf, ps, other

    q-bio.BM stat.AP

    Conformational variability of loops in the SARS-CoV-2 spike protein

    Authors: Samuel W. K. Wong, Zongjun Liu

    Abstract: The SARS-CoV-2 spike (S) protein facilitates viral infection, and has been the focus of many structure determination efforts. Its flexible loop regions are known to be involved in protein binding and may adopt multiple conformations. This paper identifies the S protein loops and studies their conformational variability based on the available Protein Data Bank (PDB) structures. While most loops had… ▽ More

    Submitted 13 October, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 24 pages

  43. arXiv:2104.10878  [pdf, other

    stat.AP q-bio.PE

    Comparing regional and provincial-wide COVID-19 models with physical distancing in British Columbia

    Authors: Geoffrey McGregor, Jennifer Tippett, Andy T. S. Wan, Mengxiao Wang, Samuel W. K. Wong

    Abstract: We study the effects of physical distancing measures for the spread of COVID-19 in regional areas within British Columbia, using the reported cases of the five provincial Health Authorities. Building on the Bayesian epidemiological model of Anderson et al. (2020), we propose a hierarchical regional Bayesian model with time-varying regional parameters between March to December of 2020. In the absen… ▽ More

    Submitted 13 November, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: 35 pages, 16 figures

    Journal ref: AIMS Mathematics, 2022, 7(4): 6743-6778

  44. arXiv:2104.10041  [pdf, other

    cs.NE cs.AI stat.AP stat.CO

    Particle swarm optimization in constrained maximum likelihood estimation a case study

    Authors: Elvis Cui, Dongyuan Song, Weng Kee Wong

    Abstract: The aim of paper is to apply two types of particle swarm optimization, global best andlocal best PSO to a constrained maximum likelihood estimation problem in pseudotime anal-ysis, a sub-field in bioinformatics. The results have shown that particle swarm optimizationis extremely useful and efficient when the optimization problem is non-differentiable and non-convex so that analytical solution can… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 11 pages, 7 figures

  45. arXiv:2103.03437  [pdf, other

    stat.ME

    Estimation of Partially Conditional Average Treatment Effect by Hybrid Kernel-covariate Balancing

    Authors: Jiayi Wang, Raymond K. W. Wong, Shu Yang, Kwun Chuen Gary Chan

    Abstract: We study nonparametric estimation for the partially conditional average treatment effect, defined as the treatment effect function over an interested subset of confounders. We propose a hybrid kernel weighting estimator where the weights aim to control the balancing error of any function of the confounders from a reproducing kernel Hilbert space after kernel smoothing over the subset of interested… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: 19 pages, 2 figures

  46. arXiv:2101.02304  [pdf, other

    stat.AP q-bio.BM

    Statistical challenges in the analysis of sequence and structure data for the COVID-19 spike protein

    Authors: Shiyu He, Samuel W. K. Wong

    Abstract: As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences i… ▽ More

    Submitted 30 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: 21 pages, 5 figures

  47. arXiv:2011.00442  [pdf, other

    stat.ME

    Penalized estimation for single-index varying-coefficient models with applications to integrative genomic analysis

    Authors: Hoi Min Ng, Binyan Jiang, Kin Yau Wong

    Abstract: Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoin… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: 18 pages, 8 figures

  48. arXiv:2010.13568  [pdf, other

    stat.ML cs.LG stat.ME

    CP Degeneracy in Tensor Regression

    Authors: Ya Zhou, Raymond K. W. Wong, Kejun He

    Abstract: Tensor linear regression is an important and useful tool for analyzing tensor data. To deal with high dimensionality, CANDECOMP/PARAFAC (CP) low-rank constraints are often imposed on the coefficient tensor parameter in the (penalized) $M$-estimation. However, we show that the corresponding optimization may not be attainable, and when this happens, the estimator is not well-defined. This is closely… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Journal ref: IEEE Access, 9:1, 7775-7788 (2021)

  49. arXiv:2009.11452  [pdf, ps, other

    stat.ME stat.AP

    A Wavelet-Based Independence Test for Functional Data with an Application to MEG Functional Connectivity

    Authors: Rui Miao, Xiaoke Zhang, Raymond K. W. Wong

    Abstract: Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independenc… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  50. Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes

    Authors: Shihao Yang, Samuel W. K. Wong, S. C. Kou

    Abstract: Parameter estimation for nonlinear dynamic system models, represented by ordinary differential equations (ODEs), using noisy and sparse data is a vital task in many fields. We propose a fast and accurate method, MAGI (MAnifold-constrained Gaussian process Inference), for this task. MAGI uses a Gaussian process model over time-series data, explicitly conditioned on the manifold constraint that deri… ▽ More

    Submitted 21 February, 2021; v1 submitted 15 September, 2020; originally announced September 2020.