-
Robust Bayesian high-dimensional variable selection and inference with the horseshoe family of priors
Authors:
Kun Fan,
Srijana Subedi,
Vishmi Ridmika Dissanayake Pathiranage,
Cen Wu
Abstract:
Frequentist robust variable selection has been extensively investigated in high-dimensional regression. Despite success, developing the corresponding statistical inference procedures remains a challenging task. Recently, tackling this challenge from a Bayesian perspective has received much attention. In literature, the two-group spike-and-slab priors that can induce exact sparsity have been demons…
▽ More
Frequentist robust variable selection has been extensively investigated in high-dimensional regression. Despite success, developing the corresponding statistical inference procedures remains a challenging task. Recently, tackling this challenge from a Bayesian perspective has received much attention. In literature, the two-group spike-and-slab priors that can induce exact sparsity have been demonstrated to yield valid inference in robust sparse linear models. Nevertheless, another important category of sparse priors, the horseshoe family of priors, including horseshoe, horseshoe+, and regularized horseshoe priors, has not yet been examined in robust high-dimensional regression by far. Their performance in variable selection and especially statistical inference in the presence of heavy-tailed model errors is not well understood. In this paper, we address the question by developing robust Bayesian hierarchical models utilizing the horseshoe family of priors along with an efficient Gibbs sampling scheme. We show that compared with competing methods with alternative sampling strategies such as slice sampling, our proposals lead to superior performance in variable selection, Bayesian estimation and statistical inference. In particular, our numeric studies indicate that even without imposing exact sparsity, the one-group horseshoe priors can still yield valid Bayesian credible intervals under robust high-dimensional linear regression models. Applications of the proposed and alternative methods on real data further illustrates the advantage of the proposed methods.
△ Less
Submitted 22 July, 2025; v1 submitted 15 July, 2025;
originally announced July 2025.
-
Balancing the effective sample size in prior across different doses in the curve-free Bayesian decision-theoretic design for dose-finding trials
Authors:
Jiapeng Xu,
Dehua Bi,
Shenghua Kelly Fan,
Bee Leng Lee,
Ying Lu
Abstract:
The primary goal of dose allocation in phase I trials is to minimize patient exposure to subtherapeutic or excessively toxic doses, while accurately recommending a phase II dose that is as close as possible to the maximum tolerated dose (MTD). Fan et al. (2012) introduced a curve-free Bayesian decision-theoretic design (CFBD), which leverages the assumption of a monotonic dose-toxicity relationshi…
▽ More
The primary goal of dose allocation in phase I trials is to minimize patient exposure to subtherapeutic or excessively toxic doses, while accurately recommending a phase II dose that is as close as possible to the maximum tolerated dose (MTD). Fan et al. (2012) introduced a curve-free Bayesian decision-theoretic design (CFBD), which leverages the assumption of a monotonic dose-toxicity relationship without directly modeling dose-toxicity curves. This approach has also been extended to drug combinations for determining the MTD (Lee et al., 2017). Although CFBD has demonstrated improved trial efficiency by using fewer patients while maintaining high accuracy in identifying the MTD, it may artificially inflate the effective sample sizes for the updated prior distributions, particularly at the lowest and highest dose levels. This can lead to either overshooting or undershooting the target dose. In this paper, we propose a modification to CFBD's prior distribution updates that balances effective sample sizes across different doses. Simulation results show that with the modified prior specification, CFBD achieves a more focused dose allocation at the MTD and offers more precise dose recommendations with fewer patients on average. It also demonstrates robustness to other well-known dose finding designs in literature.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Use of Expected Utility (EU) to Evaluate Artificial Intelligence-Enabled Rule-Out Devices for Mammography Screening
Authors:
Kwok Lung Fan,
Yee Lam Elim Thompson,
Weijie Chen,
Craig K. Abbey,
Frank W Samuelson
Abstract:
Background: An artificial intelligence (AI)-enabled rule-out device may autonomously remove patient images unlikely to have cancer from radiologist review. Many published studies evaluate this type of device by retrospectively applying the AI to large datasets and use sensitivity and specificity as the performance metrics. However, these metrics have fundamental shortcomings because they are bound…
▽ More
Background: An artificial intelligence (AI)-enabled rule-out device may autonomously remove patient images unlikely to have cancer from radiologist review. Many published studies evaluate this type of device by retrospectively applying the AI to large datasets and use sensitivity and specificity as the performance metrics. However, these metrics have fundamental shortcomings because they are bound to have opposite changes with the rule-out application of AI. Method: We reviewed two performance metrics to compare the screening performance between the radiologist-with-rule-out-device and radiologist-without-device workflows: positive/negative predictive values (PPV/NPV) and expected utility (EU). We applied both methods to a recent study that reported improved performance in the radiologist-with-device workflow using a retrospective U.S. dataset. We then applied the EU method to a European study based on the reported recall and cancer detection rates at different AI thresholds to compare the potential utility among different thresholds. Results: For the U.S. study, neither PPV/NPV nor EU can demonstrate significant improvement for any of the algorithm thresholds reported. For the study using European data, we found that EU is lower as AI rules out more patients including false-negative cases and reduces the overall screening performance. Conclusions: Due to the nature of the retrospective simulated study design, sensitivity and specificity can be ambiguous in evaluating a rule-out device. We showed that using PPV/NPV or EU can resolve the ambiguity. The EU method can be applied with only recall rates and cancer detection rates, which is convenient as ground truth is often unavailable for non-recalled patients in screening mammography.
△ Less
Submitted 1 October, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Semi-Supervised Anomaly Detection Based on Quadratic Multiform Separation
Authors:
Ko-Hui Michael Fan,
Chih-Chung Chang,
Kuang-Hsiao-Yin Kongguoluo
Abstract:
In this paper we propose a novel method for semi-supervised anomaly detection (SSAD). Our classifier is named QMS22 as its inception was dated 2022 upon the framework of quadratic multiform separation (QMS), a recently introduced classification model. QMS22 tackles SSAD by solving a multi-class classification problem involving both the training set and the test set of the original problem. The cla…
▽ More
In this paper we propose a novel method for semi-supervised anomaly detection (SSAD). Our classifier is named QMS22 as its inception was dated 2022 upon the framework of quadratic multiform separation (QMS), a recently introduced classification model. QMS22 tackles SSAD by solving a multi-class classification problem involving both the training set and the test set of the original problem. The classification problem intentionally includes classes with overlapping samples. One of the classes contains mixture of normal samples and outliers, and all other classes contain only normal samples. An outlier score is then calculated for every sample in the test set using the outcome of the classification problem. We also include performance evaluation of QMS22 against top performing classifiers using ninety-five benchmark imbalanced datasets from the KEEL repository. These classifiers are BRM (Bagging-Random Miner), OCKRA (One-Class K-means with Randomly-projected features Algorithm), ISOF (Isolation Forest), and ocSVM (One-Class Support Vector Machine). It is shown by using the area under the curve of the receiver operating characteristic curve as the performance measure, QMS22 significantly outperforms ISOF and ocSVM. Moreover, the Wilcoxon signed-rank tests reveal that there is no statistically significant difference when testing QMS22 against BRM nor QMS22 against OCKRA.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Applying data technologies to combat AMR: current status, challenges, and opportunities on the way forward
Authors:
Leonid Chindelevitch,
Elita Jauneikaite,
Nicole E. Wheeler,
Kasim Allel,
Bede Yaw Ansiri-Asafoakaa,
Wireko A. Awuah,
Denis C. Bauer,
Stephan Beisken,
Kara Fan,
Gary Grant,
Michael Graz,
Yara Khalaf,
Veranja Liyanapathirana,
Carlos Montefusco-Pereira,
Lawrence Mugisha,
Atharv Naik,
Sylvia Nanono,
Anthony Nguyen,
Timothy Rawson,
Kessendri Reddy,
Juliana M. Ruzante,
Anneke Schmider,
Roman Stocker,
Leonhardt Unruh,
Daniel Waruingi
, et al. (2 additional authors not shown)
Abstract:
Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Nume…
▽ More
Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Numerous interventions have been proposed to control the development of AMR and mitigate the risks posed by its spread. This paper reviews key aspects of bacterial AMR management and control which make essential use of data technologies such as artificial intelligence, machine learning, and mathematical and statistical modelling, fields that have seen rapid developments in this century. Although data technologies have become an integral part of biomedical research, their impact on AMR management has remained modest. We outline the use of data technologies to combat AMR, detailing recent advancements in four complementary categories: surveillance, prevention, diagnosis, and treatment. We provide an overview on current AMR control approaches using data technologies within biomedical research, clinical practice, and in the "One Health" context. We discuss the potential impact and challenges wider implementation of data technologies is facing in high-income as well as in low- and middle-income countries, and recommend concrete actions needed to allow these technologies to be more readily integrated within the healthcare and public health sectors.
△ Less
Submitted 11 August, 2022; v1 submitted 5 July, 2022;
originally announced August 2022.
-
Clustering by the Probability Distributions from Extreme Value Theory
Authors:
Sixiao Zheng,
Ke Fan,
Yanxi Hou,
Jianfeng Feng,
Yanwei Fu
Abstract:
Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probabili…
▽ More
Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into coherent subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probability of each sample in a possible cluster. To this end, this paper generalizes k-means to model the distribution of clusters. Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD) in Extreme Value Theory (EVT). Notably, we propose the concept of centroid margin distance, use GPD to establish a probability model for each cluster, and perform a clustering algorithm based on the covering probability function derived from GPD. Such a GPD k-means thus enables the clustering algorithm from the probabilistic perspective. Correspondingly, we also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means. GEV fits the distribution of the block maxima. In contrast, the GPD fits the distribution of distance to the centroid exceeding a sufficiently large threshold, leading to a more stable performance of GPD k-means. Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means. Thus, extensive experiments on synthetic datasets and real datasets demonstrate that GPD k-means outperforms competitors. The github codes are released in https://github.com/sixiaozheng/EVT-K-means.
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
RID-Noise: Towards Robust Inverse Design under Noisy Environments
Authors:
Jia-Qi Yang,
Ke-Bin Fan,
Hao Ma,
De-Chuan Zhan
Abstract:
From an engineering perspective, a design should not only perform well in an ideal condition, but should also resist noises. Such a design methodology, namely robust design, has been widely implemented in the industry for product quality control. However, classic robust design requires a lot of evaluations for a single design target, while the results of these evaluations could not be reused for a…
▽ More
From an engineering perspective, a design should not only perform well in an ideal condition, but should also resist noises. Such a design methodology, namely robust design, has been widely implemented in the industry for product quality control. However, classic robust design requires a lot of evaluations for a single design target, while the results of these evaluations could not be reused for a new target. To achieve a data-efficient robust design, we propose Robust Inverse Design under Noise (RID-Noise), which can utilize existing noisy data to train a conditional invertible neural network (cINN). Specifically, we estimate the robustness of a design parameter by its predictability, measured by the prediction error of a forward neural network. We also define a sample-wise weight, which can be used in the maximum weighted likelihood estimation of an inverse model based on a cINN. With the visual results from experiments, we clearly justify how RID-Noise works by learning the distribution and robustness from data. Further experiments on several real-world benchmark tasks with noises confirm that our method is more effective than other state-of-the-art inverse design methods. Code and supplementary is publicly available at https://github.com/ThyrixYang/rid-noise-aaai22
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Quadratic Multiform Separation: A New Classification Model in Machine Learning
Authors:
Ko-Hui Michael Fan,
Chih-Chung Chang,
Kuang-Hsiao-Yin Kongguoluo
Abstract:
In this paper we present a new classification model in machine learning. Our result is threefold: 1) The model produces comparable predictive accuracy to that of most common classification models. 2) It runs significantly faster than most common classification models. 3) It has the ability to identify a portion of unseen samples for which class labels can be found with much higher predictive accur…
▽ More
In this paper we present a new classification model in machine learning. Our result is threefold: 1) The model produces comparable predictive accuracy to that of most common classification models. 2) It runs significantly faster than most common classification models. 3) It has the ability to identify a portion of unseen samples for which class labels can be found with much higher predictive accuracy. Currently there are several patents pending on the proposed model.
△ Less
Submitted 17 August, 2022; v1 submitted 10 October, 2021;
originally announced October 2021.
-
Sparse group variable selection for gene-environment interactions in the longitudinal study
Authors:
Fei Zhou,
Xi Lu,
Jie Ren,
Kun Fan,
Shuangge Ma,
Cen Wu
Abstract:
Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating struc…
▽ More
Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G$\times$E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program (CAMP), we conduct G$\times$E study by using high dimensional SNP data as the Genetic factor and the longitudinal trait, forced expiratory volume in one second (FEV1), as phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
Identifying Gene-environment interactions with robust marginal Bayesian variable selection
Authors:
Xi Lu,
Kun Fan,
Jie Ren,
Cen Wu
Abstract:
In high-throughput genetics studies, an important aim is to identify gene-environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G$\times$E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal…
▽ More
In high-throughput genetics studies, an important aim is to identify gene-environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G$\times$E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal Bayesian variable selection method for G$\times$E studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on MCMC. The proposed method outperforms a number of alternatives in extensive simulation studies. The utility of the marginal robust Bayesian variable selection method has been further demonstrated in the case studies using data from the Nurse Health Study (NHS). Some of the identified main and interaction effects from the real data analysis have important biological implications.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Adversarial Feature Matching for Text Generation
Authors:
Yizhe Zhang,
Zhe Gan,
Kai Fan,
Zhi Chen,
Ricardo Henao,
Dinghan Shen,
Lawrence Carin
Abstract:
The Generative Adversarial Network (GAN) has achieved great success in generating realistic (real-valued) synthetic data. However, convergence issues and difficulties dealing with discrete data hinder the applicability of GAN to text. We propose a framework for generating realistic text via adversarial training. We employ a long short-term memory network as generator, and a convolutional network a…
▽ More
The Generative Adversarial Network (GAN) has achieved great success in generating realistic (real-valued) synthetic data. However, convergence issues and difficulties dealing with discrete data hinder the applicability of GAN to text. We propose a framework for generating realistic text via adversarial training. We employ a long short-term memory network as generator, and a convolutional network as discriminator. Instead of using the standard objective of GAN, we propose matching the high-dimensional latent feature distributions of real and synthetic sentences, via a kernelized discrepancy metric. This eases adversarial training by alleviating the mode-collapsing problem. Our experiments show superior performance in quantitative evaluation, and demonstrate that our model can generate realistic-looking sentences.
△ Less
Submitted 18 November, 2017; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs
Authors:
Kai Fan
Abstract:
Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up the c…
▽ More
Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up the convergence with improved use of the gradient estimation by sampling methods. However, the existing algorithm and corresponding theoretical justification depend on the assumption that the possible configurations of inputs are finite, like binary variables. The purpose of this paper is to generalize SSD for Gaussian RBM being capable of mod- eling continuous data, regardless of the previous assumption. We propose the gradient descent methods in non-Euclidean space of parameters, via de- riving the upper bounds of logarithmic partition function for RBMs based on Schatten-infinity norm. We empirically show that the advantage and improvement of SSD over stochastic gradient descent (SGD).
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Boosting Variational Inference
Authors:
Fangjian Guo,
Xiangyu Wang,
Kai Fan,
Tamara Broderick,
David B. Dunson
Abstract:
Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit po…
▽ More
Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit point. Thus, no matter how long VI is run, the resulting approximation will not approach the exact posterior. We propose to instead consider a more flexible approximating family consisting of all possible finite mixtures of a parametric base distribution (e.g., Gaussian). For efficient inference, we borrow ideas from gradient boosting to develop an algorithm we call boosting variational inference (BVI). BVI iteratively improves the current approximation by mixing it with a new component from the base distribution family and thereby yields progressively more accurate posterior approximations as more computing time is spent. Unlike a number of common VI variants including mean-field VI, BVI is able to capture multimodality, general posterior covariance, and nonstandard posterior shapes.
△ Less
Submitted 1 March, 2017; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Towards Unifying Hamiltonian Monte Carlo and Slice Sampling
Authors:
Yizhe Zhang,
Xiangyu Wang,
Changyou Chen,
Ricardo Henao,
Kai Fan,
Lawrence Carin
Abstract:
We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling, demonstrating their connection via the Hamiltonian-Jacobi equation from Hamiltonian mechanics. This insight enables extension of HMC and slice sampling to a broader family of samplers, called Monomial Gamma Samplers (MGS). We provide a theoretical analysis of the mixing performance of such samplers, proving that in the limit of a…
▽ More
We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling, demonstrating their connection via the Hamiltonian-Jacobi equation from Hamiltonian mechanics. This insight enables extension of HMC and slice sampling to a broader family of samplers, called Monomial Gamma Samplers (MGS). We provide a theoretical analysis of the mixing performance of such samplers, proving that in the limit of a single parameter, the MGS draws decorrelated samples from the desired target distribution. We further show that as this parameter tends toward this limit, performance gains are achieved at a cost of increasing numerical difficulty and some practical convergence issues. Our theoretical results are validated with synthetic data and real-world applications.
△ Less
Submitted 10 January, 2018; v1 submitted 25 February, 2016;
originally announced February 2016.
-
High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
Authors:
Chunyuan Li,
Changyou Chen,
Kai Fan,
Lawrence Carin
Abstract:
Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learnin…
▽ More
Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient thermostats (mSGNHT) augment each parameter of interest, with a momentum and a thermostat variable to maintain stationary distributions as target posterior distributions. As the number of variables in a continuous-time diffusion increases, its numerical approximation error becomes a practical bottleneck, so better use of a numerical integrator is desirable. To this end, we propose use of an efficient symmetric splitting integrator in mSGNHT, instead of the traditional Euler integrator. We demonstrate that the proposed scheme is more accurate, robust, and converges faster. These properties are demonstrated to be desirable in Bayesian deep learning. Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models.
△ Less
Submitted 23 December, 2015;
originally announced December 2015.
-
$k$-means: Fighting against Degeneracy in Sequential Monte Carlo with an Application to Tracking
Authors:
Kai Fan,
Katherine Heller
Abstract:
For regular particle filter algorithm or Sequential Monte Carlo (SMC) methods, the initial weights are traditionally dependent on the proposed distribution, the posterior distribution at the current timestamp in the sampled sequence, and the target is the posterior distribution of the previous timestamp. This is technically correct, but leads to algorithms which usually have practical issues with…
▽ More
For regular particle filter algorithm or Sequential Monte Carlo (SMC) methods, the initial weights are traditionally dependent on the proposed distribution, the posterior distribution at the current timestamp in the sampled sequence, and the target is the posterior distribution of the previous timestamp. This is technically correct, but leads to algorithms which usually have practical issues with degeneracy, where all particles eventually collapse onto a single particle. In this paper, we propose and evaluate using $k$ means clustering to attack and even take advantage of this degeneracy. Specifically, we propose a Stochastic SMC algorithm which initializes the set of $k$ means, providing the initial centers chosen from the collapsed particles. To fight against degeneracy, we adjust the regular SMC weights, mediated by cluster proportions, and then correct them to retain the same expectation as before. We experimentally demonstrate that our approach has better performance than vanilla algorithms.
△ Less
Submitted 12 November, 2015;
originally announced November 2015.
-
Fast Second-Order Stochastic Backpropagation for Variational Inference
Authors:
Kai Fan,
Ziteng Wang,
Jeff Beck,
James Kwok,
Katherine Heller
Abstract:
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this appr…
▽ More
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.
△ Less
Submitted 28 March, 2017; v1 submitted 9 September, 2015;
originally announced September 2015.
-
Bayesian Models for Heterogeneous Personalized Health Data
Authors:
Kai Fan,
Allison E. Aiello,
Katherine A. Heller
Abstract:
The purpose of this study is to leverage modern technology (such as mobile or web apps in Beckman et al. (2014)) to enrich epidemiology data and infer the transmission of disease. Homogeneity related research on population level has been intensively studied in previous work. In contrast, we develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infe…
▽ More
The purpose of this study is to leverage modern technology (such as mobile or web apps in Beckman et al. (2014)) to enrich epidemiology data and infer the transmission of disease. Homogeneity related research on population level has been intensively studied in previous work. In contrast, we develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infection in a small cell phone community and capture person-specific infection parameters by leveraging a link prior that incorporates additional covariates. We also reexamine the model evolution of the hGCHMM from simple HMMs and LDA, elucidating additional flexibility and interpretability. Due to the non-conjugacy of sparsely coupled HMMs, we design a new approximate distribution, allowing our approach to be more applicable to other application areas. Additionally, we investigate two common link functions, the beta-exponential prior and sigmoid function, both of which allow the development of a principled Bayesian hierarchical framework for disease transmission. The results of our model allow us to predict the probability of infection for each person on each day, and also to infer personal physical vulnerability and the relevant association with covariates. We demonstrate our approach experimentally on both simulation data and real epidemiological records.
△ Less
Submitted 31 August, 2015;
originally announced September 2015.
-
A Novel Non-Parametric Approach to Compare Paired General Statistical Distributions between Two Interventions
Authors:
Kang Li,
Kai Fan
Abstract:
Despite of many measures applied for determine the difference between two groups of observations, such as mean value, median value, sample stan- dard deviation and so on, we propose a novel non parametric transformation method based on Mallows distance to investigate the location and variance differences between the two groups. The convexity theory of this method is constructed and thus it is a vi…
▽ More
Despite of many measures applied for determine the difference between two groups of observations, such as mean value, median value, sample stan- dard deviation and so on, we propose a novel non parametric transformation method based on Mallows distance to investigate the location and variance differences between the two groups. The convexity theory of this method is constructed and thus it is a viable alternative for data of any distribu- tions. In addition, we are able to establish the similar method under other distance measures, such as Kolmogorov-Smirnov distance. The application of our method in real data is performed as well.
△ Less
Submitted 28 October, 2014;
originally announced October 2014.