-
Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior
Authors:
Tsai Hor Chan,
Dora Yan Zhang,
Guosheng Yin,
Lequan Yu
Abstract:
Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices ar…
▽ More
Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, the selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices are made for the priors. Existing BNN designs apply different priors to weights, while the behaviours of these priors make it difficult to sufficiently shrink noisy signals or they are prone to overshrinking important signals in the weights. To alleviate this problem, we propose a novel R2D2-Net, which imposes the R^2-induced Dirichlet Decomposition (R2D2) prior to the BNN weights. The R2D2-Net can effectively shrink irrelevant coefficients towards zero, while preventing key features from over-shrinkage. To approximate the posterior distribution of weights more accurately, we further propose a variational Gibbs inference algorithm that combines the Gibbs updating procedure and gradient-based optimization. This strategy enhances stability and consistency in estimation when the variational objective involving the shrinkage parameters is non-convex. We also analyze the evidence lower bound (ELBO) and the posterior concentration rates from a theoretical perspective. Experiments on both natural and medical image classification and uncertainty estimation tasks demonstrate satisfactory performance of our method.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Bayesian analysis of restricted mean survival time adjusted for covariates using pseudo-observations
Authors:
Léa Orsini,
Emmanuel Lesaffre,
Guosheng Yin,
Caroline Brard,
David Dejardin,
Gwénaël Le Teuff
Abstract:
The difference in restricted mean survival time (RMST) is a clinically meaningful measure to quantify treatment effect in randomized controlled trials, especially when the proportional hazards assumption does not hold. Several frequentist methods exist to estimate RMST adjusted for covariates based on modeling and integrating the survival function. A more natural approach may be a regression model…
▽ More
The difference in restricted mean survival time (RMST) is a clinically meaningful measure to quantify treatment effect in randomized controlled trials, especially when the proportional hazards assumption does not hold. Several frequentist methods exist to estimate RMST adjusted for covariates based on modeling and integrating the survival function. A more natural approach may be a regression model on RMST using pseudo-observations, which allows for a direct estimation without modeling the survival function. Only a few Bayesian methods exist, and each requires a model of the survival function. We developed a new Bayesian method that combines the use of pseudo-observations with the generalized method of moments. This offers RMST estimation adjusted for covariates without the need to model the survival function, making it more attractive than existing Bayesian methods. A simulation study was conducted with different time-dependent treatment effects (early, delayed, and crossing survival) and covariate effects, showing that our approach provides valid results, aligns with existing methods, and shows improved precision after covariate adjustment. For illustration, we applied our approach to a phase III trial in prostate cancer, providing estimates of the treatment effect on RMST, comparable to existing methods. In addition, our approach provided the effect of other covariates on RMST and determined the posterior probability of the difference in RMST exceeds any given time threshold for any covariate, allowing for nuanced and interpretable results.
△ Less
Submitted 27 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Bi-directional Curriculum Learning for Graph Anomaly Detection: Dual Focus on Homogeneity and Heterogeneity
Authors:
Yitong Hao,
Enbo He,
Yue Zhang,
Guisheng Yin
Abstract:
Graph anomaly detection (GAD) aims to identify nodes from a graph that are significantly different from normal patterns. Most previous studies are model-driven, focusing on enhancing the detection effect by improving the model structure. However, these approaches often treat all nodes equally, neglecting the different contributions of various nodes to the training. Therefore, we introduce graph cu…
▽ More
Graph anomaly detection (GAD) aims to identify nodes from a graph that are significantly different from normal patterns. Most previous studies are model-driven, focusing on enhancing the detection effect by improving the model structure. However, these approaches often treat all nodes equally, neglecting the different contributions of various nodes to the training. Therefore, we introduce graph curriculum learning as a simple and effective plug-and-play module to optimize GAD methods. The existing graph curriculum learning mainly focuses on the homogeneity of graphs and treats nodes with high homogeneity as easy nodes. In fact, GAD models can handle not only graph homogeneity but also heterogeneity, which leads to the unsuitability of these existing methods. To address this problem, we propose an innovative Bi-directional Curriculum Learning strategy (BCL), which considers nodes with higher and lower similarity to neighbor nodes as simple nodes in the direction of focusing on homogeneity and focusing on heterogeneity, respectively, and prioritizes their training. Extensive experiments show that BCL can be quickly integrated into existing detection processes and significantly improves the performance of ten GAD anomaly detection models on seven commonly used datasets.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Deep Nonparametric Inference for Conditional Hazard Function
Authors:
Wen Su,
Kin-Yat Liu,
Guosheng Yin,
Jian Huang,
Xingqiu Zhao
Abstract:
We propose a novel deep learning approach to nonparametric statistical inference for the conditional hazard function of survival time with right-censored data. We use a deep neural network (DNN) to approximate the logarithm of a conditional hazard function given covariates and obtain a DNN likelihood-based estimator of the conditional hazard function. Such an estimation approach renders model flex…
▽ More
We propose a novel deep learning approach to nonparametric statistical inference for the conditional hazard function of survival time with right-censored data. We use a deep neural network (DNN) to approximate the logarithm of a conditional hazard function given covariates and obtain a DNN likelihood-based estimator of the conditional hazard function. Such an estimation approach renders model flexibility and hence relaxes structural and functional assumptions on conditional hazard or survival functions. We establish the nonasymptotic error bound and functional asymptotic normality of the proposed estimator. Subsequently, we develop new one-sample tests for goodness-of-fit evaluation and two-sample tests for treatment comparison. Both simulation studies and real application analysis show superior performances of the proposed estimators and tests in comparison with existing methods.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Diagnosis and Pathogenic Analysis of Autism Spectrum Disorder Using Fused Brain Connection Graph
Authors:
Lu Wei,
Yi Huang,
Guosheng Yin,
Fode Zhang,
Manxue Zhang,
Bin Liu
Abstract:
We propose a model for diagnosing Autism spectrum disorder (ASD) using multimodal magnetic resonance imaging (MRI) data. Our approach integrates brain connectivity data from diffusion tensor imaging (DTI) and functional MRI (fMRI), employing graph neural networks (GNNs) for fused graph classification. To improve diagnostic accuracy, we introduce a loss function that maximizes inter-class and minim…
▽ More
We propose a model for diagnosing Autism spectrum disorder (ASD) using multimodal magnetic resonance imaging (MRI) data. Our approach integrates brain connectivity data from diffusion tensor imaging (DTI) and functional MRI (fMRI), employing graph neural networks (GNNs) for fused graph classification. To improve diagnostic accuracy, we introduce a loss function that maximizes inter-class and minimizes intra-class margins. We also analyze network node centrality, calculating degree, subgraph, and eigenvector centralities on a bimodal fused brain graph to identify pathological regions linked to ASD. Two non-parametric tests assess the statistical significance of these centralities between ASD patients and healthy controls. Our results reveal consistency between the tests, yet the identified regions differ significantly across centralities, suggesting distinct physiological interpretations. These findings enhance our understanding of ASD's neurobiological basis and offer new directions for clinical diagnosis.
△ Less
Submitted 21 September, 2024;
originally announced October 2024.
-
Bayesian generalized method of moments applied to pseudo-observations in survival analysis
Authors:
Léa Orsini,
Caroline Brard,
Emmanuel Lesaffre,
Guosheng Yin,
David Dejardin,
Gwénaël Le Teuff
Abstract:
Bayesian inference for survival regression modeling offers numerous advantages, especially for decision-making and external data borrowing, but demands the specification of the baseline hazard function, which may be a challenging task. We propose an alternative approach that does not need the specification of this function. Our approach combines pseudo-observations to convert censored data into lo…
▽ More
Bayesian inference for survival regression modeling offers numerous advantages, especially for decision-making and external data borrowing, but demands the specification of the baseline hazard function, which may be a challenging task. We propose an alternative approach that does not need the specification of this function. Our approach combines pseudo-observations to convert censored data into longitudinal data with the Generalized Methods of Moments (GMM) to estimate the parameters of interest from the survival function directly. GMM may be viewed as an extension of the Generalized Estimating Equation (GEE) currently used for frequentist pseudo-observations analysis and can be extended to the Bayesian framework using a pseudo-likelihood function. We assessed the behavior of the frequentist and Bayesian GMM in the new context of analyzing pseudo-observations. We compared their performances to the Cox, GEE, and Bayesian piecewise exponential models through a simulation study of two-arm randomized clinical trials. Frequentist and Bayesian GMM gave valid inferences with similar performances compared to the three benchmark methods, except for small sample sizes and high censoring rates. For illustration, three post-hoc efficacy analyses were performed on randomized clinical trials involving patients with Ewing Sarcoma, producing results similar to those of the benchmark methods. Through a simple application of estimating hazard ratios, these findings confirm the effectiveness of this new Bayesian approach based on pseudo-observations and the generalized method of moments. This offers new insights on using pseudo-observations for Bayesian survival analysis.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information
Authors:
Jiaqi Gu,
Ruoxu Tan,
Guosheng Yin
Abstract:
Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric informtion in the data, we propose to apply the Delaunay triangulation and develop the Delaunay w…
▽ More
Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric informtion in the data, we propose to apply the Delaunay triangulation and develop the Delaunay weight to measure the geometric proximity among data points. In contrast to existing similarity measures that only utilize pairwise distances, the Delaunay weight can take both the distance and direction information into account. A detailed computation procedure to approximate the Delaunay weight for the unknown manifold is developed. We further propose a novel nonparametric test statistic using the Delaunay weight matrix to test whether the underlying distributions of two samples are the same or not. Applied on simulated data, the new test exhibits substantial power gain in detecting differences in principal directions between distributions. The proposed test also shows great power on a real dataset of human face images.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Efficient Estimation for Functional Accelerated Failure Time Model
Authors:
Changyu Liu,
Wen Su,
Kin-Yat Liu,
Guosheng Yin,
Xingqiu Zhao
Abstract:
We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baselin…
▽ More
We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baseline hazard function in the likelihood function. Not only do the bundled parameters cause immense numerical difficulties, but they also result in new challenges in theoretical development. By developing a general theoretical framework, we overcome the challenges arising from the bundled parameters and derive the convergence rate of the proposed estimator. Furthermore, we prove that the finite-dimensional estimator is $\sqrt{n}$-consistent, asymptotically normal and achieves the semiparametric information bound. The proposed inference procedures are evaluated by extensive simulation studies and illustrated with an application to the sequential organ failure assessment data from the Improving Care of Acute Lung Injury Patients study.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Futures Quantitative Investment with Heterogeneous Continual Graph Neural Network
Authors:
Min Hu,
Zhizhong Tan,
Bin Liu,
Guosheng Yin
Abstract:
This study aims to address the challenges of futures price prediction in high-frequency trading (HFT) by proposing a continuous learning factor predictor based on graph neural networks. The model integrates multi-factor pricing theories with real-time market dynamics, effectively bypassing the limitations of existing methods that lack financial theory guidance and ignore various trend signals and…
▽ More
This study aims to address the challenges of futures price prediction in high-frequency trading (HFT) by proposing a continuous learning factor predictor based on graph neural networks. The model integrates multi-factor pricing theories with real-time market dynamics, effectively bypassing the limitations of existing methods that lack financial theory guidance and ignore various trend signals and their interactions. We propose three heterogeneous tasks, including price moving average regression, price gap regression and change-point detection to trace the short-, intermediate-, and long-term trend factors present in the data. In addition, this study also considers the cross-sectional correlation characteristics of future contracts, where prices of different futures often show strong dynamic correlations. Each variable (future contract) depends not only on its historical values (temporal) but also on the observation of other variables (cross-sectional). To capture these dynamic relationships more accurately, we resort to the spatio-temporal graph neural network (STGNN) to enhance the predictive power of the model. The model employs a continuous learning strategy to simultaneously consider these tasks (factors). Additionally, due to the heterogeneity of the tasks, we propose to calculate parameter importance with mutual information between original observations and the extracted features to mitigate the catastrophic forgetting (CF) problem. Empirical tests on 49 commodity futures in China's futures market demonstrate that the proposed model outperforms other state-of-the-art models in terms of prediction accuracy. Not only does this research promote the integration of financial theory and deep learning, but it also provides a scientific basis for actual trading decisions.
△ Less
Submitted 19 December, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
GWRBoost:A geographically weighted gradient boosting method for explainable quantification of spatially-varying relationships
Authors:
Han Wang,
Zhou Huang,
Ganmin Yin,
Yi Bao,
Xiao Zhou,
Yong Gao
Abstract:
The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inf…
▽ More
The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inferior comparative performance. Nevertheless, some advanced models, such as the decision tree and the support vector machine, can learn features from complex data more effectively while they cannot provide explainable quantification for the spatial variation of localized relationships. To address the above issues, we propose a geographically gradient boosting weighted regression model, GWRBoost, that applies the localized additive model and gradient boosting optimization method to alleviate underfitting problems and retains explainable quantification capability for spatially-varying relationships between geographically located variables. Furthermore, we formulate the computation method of the Akaike information score for the proposed model to conduct the comparative analysis with the classic GWR algorithm. Simulation experiments and the empirical case study are applied to prove the efficient performance and practical value of GWRBoost. The results show that our proposed model can reduce the RMSE by 18.3% in parameter estimation accuracy and AICc by 67.3% in the goodness of fit.
△ Less
Submitted 15 December, 2022; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Causal Effect of Functional Treatment
Authors:
Ruoxu Tan,
Wei Huang,
Zheng Zhang,
Guosheng Yin
Abstract:
We study the causal effect with a functional treatment variable, where practical applications often arise in neuroscience, biomedical sciences, etc. Previous research concerning the effect of a functional variable on an outcome is typically restricted to exploring correlation rather than causality. The generalized propensity score, which is often used to calibrate the selection bias, is not direct…
▽ More
We study the causal effect with a functional treatment variable, where practical applications often arise in neuroscience, biomedical sciences, etc. Previous research concerning the effect of a functional variable on an outcome is typically restricted to exploring correlation rather than causality. The generalized propensity score, which is often used to calibrate the selection bias, is not directly applicable to a functional treatment variable due to a lack of definition of probability density function for functional data. We propose three estimators for the average dose-response functional based on the functional linear model, namely, the functional stabilized weight estimator, the outcome regression estimator and the doubly robust estimator, each of which has its own merits. We study their theoretical properties, which are corroborated through extensive numerical experiments. A real data application on electroencephalography data and disease severity demonstrates the practical value of our methods.
△ Less
Submitted 17 May, 2025; v1 submitted 1 October, 2022;
originally announced October 2022.
-
Oncology Dose Finding Using Approximate Bayesian Computation Design
Authors:
Huaqing Jin,
Wenbin Du,
Guosheng Yin
Abstract:
In the development of new cancer treatment, an essential step is to determine the maximum tolerated dose (MTD) via phase I clinical trials. Generally speaking, phase I trial designs can be classified as either model-based or algorithm-based approaches. Model-based phase I designs are typically more efficient by using all observed data, while there is a potential risk of model misspecification that…
▽ More
In the development of new cancer treatment, an essential step is to determine the maximum tolerated dose (MTD) via phase I clinical trials. Generally speaking, phase I trial designs can be classified as either model-based or algorithm-based approaches. Model-based phase I designs are typically more efficient by using all observed data, while there is a potential risk of model misspecification that may lead to unreliable dose assignment and incorrect MTD identification. In contrast, most of the algorithm-based designs are less efficient in using cumulative information, because they tend to focus on the observed data in the neighborhood of the current dose level for dose movement. To use the data more efficiently yet without any model assumption, we propose a novel approximate Bayesian computation (ABC) approach for phase I trial design. Not only is the ABC design free of any dose--toxicity curve assumption, but it can also aggregate all the available information accrued in the trial for dose assignment. Extensive simulation studies demonstrate its robustness and efficiency compared with other phase I designs. We apply the ABC design to the MEK inhibitor selumetinib trial to demonstrate its satisfactory performance. The proposed design can be a useful addition to the family of phase I clinical trial designs due to its simplicity, efficiency and robustness.
△ Less
Submitted 28 February, 2022;
originally announced March 2022.
-
PCA Rerandomization
Authors:
Hengtao Zhang,
Guosheng Yin,
Donald B. Rubin
Abstract:
Mahalanobis distance between treatment group and control group covariate means is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. Here, we propose leveraging principal component analysis (PCA) to identify proper subspaces in which Mahala…
▽ More
Mahalanobis distance between treatment group and control group covariate means is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. Here, we propose leveraging principal component analysis (PCA) to identify proper subspaces in which Mahalanobis distance should be calculated. Not only can PCA effectively reduce the dimensionality for high-dimensional cases while capturing most of the information in the covariates, but it also provides computational simplicity by focusing on the top orthogonal components. We show that our PCA rerandomization scheme has desirable theoretical properties on balancing covariates and thereby on improving the estimation of average treatment effects. We also show that this conclusion is supported by numerical studies using both simulated and real examples.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Bayesian Knockoff Filter
Authors:
Jiaqi Gu,
Guosheng Yin
Abstract:
In many scientific fields, researchers are interested in discovering features with substantial effect on the response from a large number of features while controlling the proportion of false discoveries. By incorporating the knockoff procedure in the Bayesian framework, we develop the Bayesian knockoff filter (BKF) for selecting features that have important effect on the response. In contrast to…
▽ More
In many scientific fields, researchers are interested in discovering features with substantial effect on the response from a large number of features while controlling the proportion of false discoveries. By incorporating the knockoff procedure in the Bayesian framework, we develop the Bayesian knockoff filter (BKF) for selecting features that have important effect on the response. In contrast to the fixed knockoff variables in a frequentist procedure, we allow the knockoff variables to be continuously updated using the Markov chain Monte Carlo. Based on the posterior samples and the elaborated greedy selection procedure, our method can distinguish the truly important features from unimportant ones and the Bayesian false discovery rate can be controlled at a desirable level. Numerical experiments on both synthetic and real data demonstrate the advantages of our BKF over existing knockoff methods and Bayesian variable selection approaches, i.e., the BKF possesses higher power and yields a lower false discovery rate.
△ Less
Submitted 24 February, 2023; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Unit Information Prior for Adaptive Information Borrowing from Multiple Historical Datasets
Authors:
Huaqing Jin,
Guosheng Yin
Abstract:
In clinical trials, there often exist multiple historical studies for the same or related treatment investigated in the current trial. Incorporating historical data in the analysis of the current study is of great importance, as it can help to gain more information, improve efficiency, and provide a more comprehensive evaluation of treatment. Enlightened by the unit information prior (UIP) concept…
▽ More
In clinical trials, there often exist multiple historical studies for the same or related treatment investigated in the current trial. Incorporating historical data in the analysis of the current study is of great importance, as it can help to gain more information, improve efficiency, and provide a more comprehensive evaluation of treatment. Enlightened by the unit information prior (UIP) concept in the reference Bayesian test, we propose a new informative prior called UIP from an information perspective that can adaptively borrow information from multiple historical datasets. We consider both binary and continuous data and also extend the new UIP methods to linear regression settings. Extensive simulation studies demonstrate that our method is comparable to other commonly used informative priors, while the interpretation of UIP is intuitive and its implementation is relatively easy. One distinctive feature of UIP is that its construction only requires summary statistics commonly reported in the literature rather than the patient-level data. By applying our UIP methods to phase III clinical trials for investigating the efficacy of memantine in Alzheimer's disease, we illustrate its ability of adaptively borrowing information from multiple historical datasets in the real application.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Reconstruct Kaplan--Meier Estimator as M-estimator and Its Confidence Band
Authors:
Jiaqi Gu,
Yiwei Fan,
Guosheng Yin
Abstract:
The Kaplan--Meier (KM) estimator, which provides a nonparametric estimate of a survival function for time-to-event data, has wide application in clinical studies, engineering, economics and other fields. The theoretical properties of the KM estimator including its consistency and asymptotic distribution have been extensively studied. We reconstruct the KM estimator as an M-estimator by maximizing…
▽ More
The Kaplan--Meier (KM) estimator, which provides a nonparametric estimate of a survival function for time-to-event data, has wide application in clinical studies, engineering, economics and other fields. The theoretical properties of the KM estimator including its consistency and asymptotic distribution have been extensively studied. We reconstruct the KM estimator as an M-estimator by maximizing a quadratic M-function based on concordance, which can be computed using the expectation--maximization (EM) algorithm. It is shown that the convergent point of the EM algorithm coincides with the traditional KM estimator, offering a new interpretation of the KM estimator as an M-estimator. Theoretical properties including the large-sample variance and limiting distribution of the KM estimator are established using M-estimation theory. Simulations and application on two real datasets demonstrate that the proposed M-estimator is exactly equivalent to the KM estimator, while the confidence interval and band can be derived as well.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Adaptive Non-reversible Stochastic Gradient Langevin Dynamics
Authors:
Vikram Krishnamurthy,
George Yin
Abstract:
It is well known that adding any skew symmetric matrix to the gradient of Langevin dynamics algorithm results in a non-reversible diffusion with improved convergence rate. This paper presents a gradient algorithm to adaptively optimize the choice of the skew symmetric matrix. The resulting algorithm involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm th…
▽ More
It is well known that adding any skew symmetric matrix to the gradient of Langevin dynamics algorithm results in a non-reversible diffusion with improved convergence rate. This paper presents a gradient algorithm to adaptively optimize the choice of the skew symmetric matrix. The resulting algorithm involves a non-reversible diffusion algorithm cross coupled with a stochastic gradient algorithm that adapts the skew symmetric matrix. The algorithm uses the same data as the classical Langevin algorithm. A weak convergence proof is given for the optimality of the choice of the skew symmetric matrix. The improved convergence rate of the algorithm is illustrated numerically in Bayesian learning and tracking examples.
△ Less
Submitted 26 September, 2020;
originally announced September 2020.
-
Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning
Authors:
Vikram Krishnamurthy,
George Yin
Abstract:
This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the de…
▽ More
This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the desired point. In this paper we construct a multi-kernel passive stochastic gradient algorithm. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction. We analyze the weak convergence of the multi-kernel algorithm and its rate of convergence. In numerical examples, we study the multi-kernel version of the passive least mean squares (LMS) algorithm for transfer learning to compare the performance with the classical passive version.
△ Less
Submitted 7 February, 2021; v1 submitted 23 August, 2020;
originally announced August 2020.
-
Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms
Authors:
Vikram Krishnamurthy,
George Yin
Abstract:
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(θ)$; specifically,…
▽ More
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(θ)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution proportional to $\exp(R(θ))$. The proposed IRL algorithms use kernel-based passive learning schemes. We also construct multi-kernel passive Langevin algorithms for IRL which are suitable for high dimensional data. The performance of the proposed IRL algorithms are illustrated on examples in adaptive Bayesian learning, logistic regression (high dimensional problem) and constrained Markov decision processes. We prove weak convergence of the proposed IRL algorithms using martingale averaging methods. We also analyze the tracking performance of the IRL algorithms in non-stationary environments where the utility function $R(θ)$ jump changes over time as a slow Markov chain.
△ Less
Submitted 18 January, 2021; v1 submitted 20 June, 2020;
originally announced June 2020.
-
Demystify Lindley's Paradox by Interpreting P-value as Posterior Probability
Authors:
Guosheng Yin,
Haolun Shi
Abstract:
In the hypothesis testing framework, p-value is often computed to determine rejection of the null hypothesis or not. On the other hand, Bayesian approaches typically compute the posterior probability of the null hypothesis to evaluate its plausibility. We revisit Lindley's paradox (Lindley, 1957) and demystify the conflicting results between Bayesian and frequentist hypothesis testing procedures b…
▽ More
In the hypothesis testing framework, p-value is often computed to determine rejection of the null hypothesis or not. On the other hand, Bayesian approaches typically compute the posterior probability of the null hypothesis to evaluate its plausibility. We revisit Lindley's paradox (Lindley, 1957) and demystify the conflicting results between Bayesian and frequentist hypothesis testing procedures by casting a two-sided hypothesis as a combination of two one-sided hypotheses along the opposite directions. This can naturally circumvent the ambiguities of assigning a point mass to the null and choices of using local or non-local prior distributions. As p-value solely depends on the observed data without incorporating any prior information, we consider non-informative prior distributions for fair comparisons with p-value. The equivalence of p-value and the Bayesian posterior probability of the null hypothesis can be established to reconcile Lindley's paradox. Extensive simulation studies are conducted with multivariate normal data and random effects models to examine the relationship between the p-value and posterior probability.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Nonparametric Functional Approximation with Delaunay Triangulation
Authors:
Yehong Liu,
Guosheng Yin
Abstract:
We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a $p$-dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of $p$-dimensional simplices in a geometrically optimal way, and fits a linear model wit…
▽ More
We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a $p$-dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of $p$-dimensional simplices in a geometrically optimal way, and fits a linear model within each simplex. We study its theoretical properties by exploring the geometric properties of the Delaunay triangulation, and compare its performance with other statistical learners in numerical studies.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
The statistical finite element method (statFEM) for coherent synthesis of observation data and model predictions
Authors:
Mark Girolami,
Eky Febrianto,
Ge Yin,
Fehmi Cirak
Abstract:
The increased availability of observation data from engineering systems in operation poses the question of how to incorporate this data into finite element models. To this end, we propose a novel statistical construction of the finite element method that provides the means of synthesising measurement data and finite element models. The Bayesian statistical framework is adopted to treat all the unc…
▽ More
The increased availability of observation data from engineering systems in operation poses the question of how to incorporate this data into finite element models. To this end, we propose a novel statistical construction of the finite element method that provides the means of synthesising measurement data and finite element models. The Bayesian statistical framework is adopted to treat all the uncertainties present in the data, the mathematical model and its finite element discretisation. From the outset, we postulate a data-generating model which additively decomposes data into a finite element, a model misspecification and a noise component. Each of the components may be uncertain and is considered as a random variable with a respective prior probability density. The prior of the finite element component is given by a conventional stochastic forward problem. The prior probabilities of the model misspecification and measurement noise, without loss of generality, are assumed to have zero-mean and known covariance structure. Our proposed statistical model is hierarchical in the sense that each of the three random components may depend on non-observable random hyperparameters. Because of the hierarchical structure of the statistical model, Bayes rule is applied on three different levels in turn to infer the posterior densities of the three random components and hyperparameters. On level one, we determine the posterior densities of the finite element component and the true system response using the prior finite element density given by the forward problem and the data likelihood. On the next level, we infer the hyperparameter posterior densities from their respective priors and the marginal likelihood of the first inference problem. Finally, on level three we use Bayes rule to choose the most suitable finite element model in light of the observed data by computing the model posteriors.
△ Less
Submitted 22 January, 2021; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Adaptive Iterative Hessian Sketch via A-Optimal Subsampling
Authors:
Aijun Zhang,
Hengtao Zhang,
Guosheng Yin
Abstract:
Iterative Hessian sketch (IHS) is an effective sketching method for modeling large-scale data. It was originally proposed by Pilanci and Wainwright (2016; JMLR) based on randomized sketching matrices. However, it is computationally intensive due to the iterative sketch process. In this paper, we analyze the IHS algorithm under the unconstrained least squares problem setting, then propose a determi…
▽ More
Iterative Hessian sketch (IHS) is an effective sketching method for modeling large-scale data. It was originally proposed by Pilanci and Wainwright (2016; JMLR) based on randomized sketching matrices. However, it is computationally intensive due to the iterative sketch process. In this paper, we analyze the IHS algorithm under the unconstrained least squares problem setting, then propose a deterministic approach for improving IHS via A-optimal subsampling. Our contributions are three-fold: (1) a good initial estimator based on the A-optimal design is suggested; (2) a novel ridged preconditioner is developed for repeated sketching; and (3) an exact line search method is proposed for determining the optimal step length adaptively. Extensive experimental results demonstrate that our proposed A-optimal IHS algorithm outperforms the existing accelerated IHS methods.
△ Less
Submitted 8 March, 2020; v1 submitted 20 February, 2019;
originally announced February 2019.
-
P-value: A Bless or A Curse for Evidence-Based Studies?
Authors:
Haolun Shi,
Guosheng Yin
Abstract:
As a convention, p-value is often computed in frequentist hypothesis testing and compared with the nominal significance level of 0.05 to determine whether or not to reject the null hypothesis. The smaller the p-value, the more significant the statistical test. We consider both one-sided and two-sided hypotheses in the composite hypothesis setting. For one-sided hypothesis tests, we establish the e…
▽ More
As a convention, p-value is often computed in frequentist hypothesis testing and compared with the nominal significance level of 0.05 to determine whether or not to reject the null hypothesis. The smaller the p-value, the more significant the statistical test. We consider both one-sided and two-sided hypotheses in the composite hypothesis setting. For one-sided hypothesis tests, we establish the equivalence of p-value and the Bayesian posterior probability of the null hypothesis, which renders p-value an explicit interpretation of how strong the data support the null. For two-sided hypothesis tests of a point null, we recast the problem as a combination of two one-sided hypotheses alone the opposite directions and put forward the notion of a two-sided posterior probability, which also has an equivalent relationship with the (two-sided) p-value. Extensive simulation studies are conducted to demonstrate the Bayesian posterior probability interpretation for the p-value. Contrary to common criticisms of the use of p-value in evidence-based studies, we justify its utility and reclaim its importance from the Bayesian perspective, and recommend the continual use of p-value in hypothesis testing. After all, p-value is not all that bad.
△ Less
Submitted 22 September, 2018;
originally announced September 2018.
-
Bayesian Outdoor Defect Detection
Authors:
Fei Jiang,
Guosheng Yin
Abstract:
We introduce a Bayesian defect detector to facilitate the defect detection on the motion blurred images on rough texture surfaces. To enhance the accuracy of Bayesian detection on removing non-defect pixels, we develop a class of reflected non-local prior distributions, which is constructed by using the mode of a distribution to subtract its density. The reflected non-local priors forces the Bayes…
▽ More
We introduce a Bayesian defect detector to facilitate the defect detection on the motion blurred images on rough texture surfaces. To enhance the accuracy of Bayesian detection on removing non-defect pixels, we develop a class of reflected non-local prior distributions, which is constructed by using the mode of a distribution to subtract its density. The reflected non-local priors forces the Bayesian detector to approach 0 at the non-defect locations. We conduct experiments studies to demonstrate the superior performance of the Bayesian detector in eliminating the non-defect points. We implement the Bayesian detector in the motion blurred drone images, in which the detector successfully identifies the hail damages on the rough surface and substantially enhances the accuracy of the entire defect detection pipeline.
△ Less
Submitted 30 August, 2018;
originally announced September 2018.
-
Parallel Transport Unfolding: A Connection-based Manifold Learning Approach
Authors:
Max Budninskiy,
Glorian Yin,
Leman Feng,
Yiying Tong,
Mathieu Desbrun
Abstract:
Manifold learning offers nonlinear dimensionality reduction of high-dimensional datasets. In this paper, we bring geometry processing to bear on manifold learning by introducing a new approach based on metric connection for generating a quasi-isometric, low-dimensional mapping from a sparse and irregular sampling of an arbitrary manifold embedded in a high-dimensional space. Geodesic distances of…
▽ More
Manifold learning offers nonlinear dimensionality reduction of high-dimensional datasets. In this paper, we bring geometry processing to bear on manifold learning by introducing a new approach based on metric connection for generating a quasi-isometric, low-dimensional mapping from a sparse and irregular sampling of an arbitrary manifold embedded in a high-dimensional space. Geodesic distances of discrete paths over the input pointset are evaluated through "parallel transport unfolding" (PTU) to offer robustness to poor sampling and arbitrary topology. Our new geometric procedure exhibits the same strong resilience to noise as one of the staples of manifold learning, the Isomap algorithm, as it also exploits all pairwise geodesic distances to compute a low-dimensional embedding. While Isomap is limited to geodesically-convex sampled domains, parallel transport unfolding does not suffer from this crippling limitation, resulting in an improved robustness to irregularity and voids in the sampling. Moreover, it involves only simple linear algebra, significantly improves the accuracy of all pairwise geodesic distance approximations, and has the same computational complexity as Isomap. Finally, we show that our connection-based distance estimation can be used for faster variants of Isomap such as L-Isomap.
△ Less
Submitted 2 November, 2018; v1 submitted 23 June, 2018;
originally announced June 2018.
-
Bayesian data augmentation dose finding with continual reassessment method and delayed toxicity
Authors:
Suyu Liu,
Guosheng Yin,
Ying Yuan
Abstract:
A major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decision rules may not be observed shortly after the initiation of the treatment. To address this issue, we propose the data augmentation continual reassessment method (DA-CRM) for dose finding. By naturally treating the unobserved toxicities as missing data, we show that such miss…
▽ More
A major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decision rules may not be observed shortly after the initiation of the treatment. To address this issue, we propose the data augmentation continual reassessment method (DA-CRM) for dose finding. By naturally treating the unobserved toxicities as missing data, we show that such missing data are nonignorable in the sense that the missingness depends on the unobserved outcomes. The Bayesian data augmentation approach is used to sample both the missing data and model parameters from their posterior full conditional distributions. We evaluate the performance of the DA-CRM through extensive simulation studies and also compare it with other existing methods. The results show that the proposed design satisfactorily resolves the issues related to late-onset toxicities and possesses desirable operating characteristics: treating patients more safely and also selecting the maximum tolerated dose with a higher probability. The new DA-CRM is illustrated with two phase I cancer clinical trials.
△ Less
Submitted 8 January, 2014;
originally announced January 2014.
-
Bayesian phase I/II adaptively randomized oncology trials with combined drugs
Authors:
Ying Yuan,
Guosheng Yin
Abstract:
We propose a new integrated phase I/II trial design to identify the most efficacious dose combination that also satisfies certain safety requirements for drug-combination trials. We first take a Bayesian copula-type model for dose finding in phase I. After identifying a set of admissible doses, we immediately move the entire set forward to phase II. We propose a novel adaptive randomization scheme…
▽ More
We propose a new integrated phase I/II trial design to identify the most efficacious dose combination that also satisfies certain safety requirements for drug-combination trials. We first take a Bayesian copula-type model for dose finding in phase I. After identifying a set of admissible doses, we immediately move the entire set forward to phase II. We propose a novel adaptive randomization scheme to favor assigning patients to more efficacious dose-combination arms. Our adaptive randomization scheme takes into account both the point estimate and variability of efficacy. By using a moving reference to compare the relative efficacy among treatment arms, our method achieves a high resolution to distinguish different arms. We also consider groupwise adaptive randomization when efficacy is late-onset. We conduct extensive simulation studies to examine the operating characteristics of the proposed design, and illustrate our method using a phase I/II melanoma clinical trial.
△ Less
Submitted 8 August, 2011;
originally announced August 2011.