-
The Curious Problem of the Normal Inverse Mean
Authors:
Soham Ghosh,
Uttaran Chatterjee,
Jyotishka Datta
Abstract:
In astronomical observations, the estimation of distances from parallaxes is a challenging task due to the inherent measurement errors and the non-linear relationship between the parallax and the distance. This study leverages ideas from robust Bayesian inference to tackle these challenges, investigating a broad class of prior densities for estimating distances with a reduced bias and variance. Th…
▽ More
In astronomical observations, the estimation of distances from parallaxes is a challenging task due to the inherent measurement errors and the non-linear relationship between the parallax and the distance. This study leverages ideas from robust Bayesian inference to tackle these challenges, investigating a broad class of prior densities for estimating distances with a reduced bias and variance. Through theoretical analysis, simulation experiments, and the application to data from the Gaia Data Release 1 (GDR1), we demonstrate that heavy-tailed priors provide more reliable distance estimates, particularly in the presence of large fractional parallax errors. Theoretical results highlight the "curse of a single observation," where the likelihood dominates the posterior, limiting the impact of the prior. Nevertheless, heavy-tailed priors can delay the explosion of posterior risk, offering a more robust framework for distance estimation. The findings suggest that reciprocal invariant priors, with polynomial decay in their tails, such as the Half-Cauchy and Product Half-Cauchy, are particularly well-suited for this task, providing a balance between bias reduction and variance control.
△ Less
Submitted 27 October, 2024;
originally announced October 2024.
-
Horseshoe-type Priors for Independent Component Estimation
Authors:
Jyotishka Datta,
Nicholas G. Polson
Abstract:
Independent Component Estimation (ICE) has many applications in modern day machine learning as a feature engineering extraction method. Horseshoe-type priors are used to provide scalable algorithms that enables both point estimates via expectation-maximization (EM) and full posterior sampling via Markov Chain Monte Carlo (MCMC) algorithms. Our methodology also applies to flow-based methods for non…
▽ More
Independent Component Estimation (ICE) has many applications in modern day machine learning as a feature engineering extraction method. Horseshoe-type priors are used to provide scalable algorithms that enables both point estimates via expectation-maximization (EM) and full posterior sampling via Markov Chain Monte Carlo (MCMC) algorithms. Our methodology also applies to flow-based methods for nonlinear feature extraction and deep learning. We also discuss how to implement conditional posteriors and envelope-based methods for optimization. Through this hierarchy representation, we unify a number of hitherto disparate estimation procedures. We illustrate our methodology and algorithms on a numerical example. Finally, we conclude with directions for future research.
△ Less
Submitted 1 September, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Nonparametric Bayes multiresolution testing for high-dimensional rare events
Authors:
Jyotishka Datta,
Sayantan Banerjee,
David B. Dunson
Abstract:
In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests fo…
▽ More
In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests for differential mutation rates across groups. The multiresolution approach yields fast and accurate detection of spatial clusters of rare variants, and our nonparametric Bayes framework provides great flexibility for modeling the intensities of rare variants. Some theoretical properties are also assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma mixture over multiple resolutions. Simulation studies illustrate excellent small sample properties relative to competitors, and we apply the method to detect rare variants related to common variable immunodeficiency from whole exome sequencing data on 215 patients and over 60,027 control subjects.
△ Less
Submitted 19 January, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Quantile Importance Sampling
Authors:
Jyotishka Datta,
Nicholas G. Polson
Abstract:
In Bayesian inference, the approximation of integrals of the form $ψ= \mathbb{E}_{F}{l(X)} = \int_χ l(\mathbf{x}) d F(\mathbf{x})$ is a fundamental challenge. Such integrals are crucial for evidence estimation, which is important for various purposes, including model selection and numerical analysis. The existing strategies for evidence estimation are classified into four categories: deterministic…
▽ More
In Bayesian inference, the approximation of integrals of the form $ψ= \mathbb{E}_{F}{l(X)} = \int_χ l(\mathbf{x}) d F(\mathbf{x})$ is a fundamental challenge. Such integrals are crucial for evidence estimation, which is important for various purposes, including model selection and numerical analysis. The existing strategies for evidence estimation are classified into four categories: deterministic approximation, density estimation, importance sampling, and vertical representation (Llorente et al., 2020). In this paper, we show that the Riemann sum estimator due to Yakowitz (1978) can be used in the context of nested sampling (Skilling, 2006) to achieve a $O(n^{-4})$ rate of convergence, faster than the usual Ergodic Central Limit Theorem. We provide a brief overview of the literature on the Riemann sum estimators and the nested sampling algorithm and its connections to vertical likelihood Monte Carlo. We provide theoretical and numerical arguments to show how merging these two ideas may result in improved and more robust estimators for evidence estimation, especially in higher dimensional spaces. We also briefly discuss the idea of simulating the Lorenz curve that avoids the problem of intractable $Λ$ functions, essential for the vertical representation and nested sampling.
△ Less
Submitted 25 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Maximum a Posteriori Estimation in Graphical Models Using Local Linear Approximation
Authors:
Ksheera Sagar,
Jyotishka Datta,
Sayantan Banerjee,
Anindya Bhadra
Abstract:
Sparse structure learning in high-dimensional Gaussian graphical models is an important problem in multivariate statistical signal processing; since the sparsity pattern naturally encodes the conditional independence relationship among variables. However, maximum a posteriori (MAP) estimation is challenging under hierarchical prior models, and traditional numerical optimization routines or expecta…
▽ More
Sparse structure learning in high-dimensional Gaussian graphical models is an important problem in multivariate statistical signal processing; since the sparsity pattern naturally encodes the conditional independence relationship among variables. However, maximum a posteriori (MAP) estimation is challenging under hierarchical prior models, and traditional numerical optimization routines or expectation--maximization algorithms are difficult to implement. To this end, our contribution is a novel local linear approximation scheme that circumvents this issue using a very simple computational algorithm. Most importantly, the condition under which our algorithm is guaranteed to converge to the MAP estimate is explicitly stated and is shown to cover a broad class of completely monotone priors, including the graphical horseshoe. Further, the resulting MAP estimate is shown to be sparse and consistent in the $\ell_2$-norm. Numerical results validate the speed, scalability, and statistical performance of the proposed method.
△ Less
Submitted 23 September, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Quantifying the Effect of Socio-Economic Predictors and Built Environment on Mental Health Events in Little Rock, AR
Authors:
Alfieri Ek,
Samantha Robinson,
Grant Drawve,
Jyotishka Datta
Abstract:
Proper allocation of law enforcement resources remains a critical issue in crime prediction and prevention that operates by characterizing spatially aggregated crime activities and a multitude of predictor variables of interest. Despite the critical nature of proper resource allocation for mental health incidents, there has been little progress in statistical modeling of the geo-spatial nature of…
▽ More
Proper allocation of law enforcement resources remains a critical issue in crime prediction and prevention that operates by characterizing spatially aggregated crime activities and a multitude of predictor variables of interest. Despite the critical nature of proper resource allocation for mental health incidents, there has been little progress in statistical modeling of the geo-spatial nature of mental health events in Little Rock, Arkansas. In this article, we provide insights into the spatial nature of mental health data from Little Rock, Arkansas between 2015 and 2018, under a supervised spatial modeling framework while extending the popular risk terrain modeling (Caplan et al., 2011, 2015; Drawve, 2016) approach. We provide evidence of spatial clustering and identify the important features influencing such heterogeneity via a spatially informed hierarchy of generalized linear models, spatial regression models and a tree based method, viz., Poisson regression, spatial Durbin error model, Manski model and Random Forest. The insights obtained from these different models are presented here along with their relative predictive performances. The inferential tools developed here can be used in a broad variety of spatial modeling contexts and have the potential to aid both law enforcement agencies and the city in properly allocating resources.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Evidence Estimation in Gaussian Graphical Models Using a Telescoping Block Decomposition of the Precision Matrix
Authors:
Anindya Bhadra,
Ksheera Sagar,
David Rowe,
Sayantan Banerjee,
Jyotishka Datta
Abstract:
Marginal likelihood, also known as model evidence, is a fundamental quantity in Bayesian statistics. It is used for model selection using Bayes factors or for empirical Bayes tuning of prior hyper-parameters. Yet, the calculation of evidence has remained a longstanding open problem in Gaussian graphical models. Currently, the only feasible solutions that exist are for special cases such as the Wis…
▽ More
Marginal likelihood, also known as model evidence, is a fundamental quantity in Bayesian statistics. It is used for model selection using Bayes factors or for empirical Bayes tuning of prior hyper-parameters. Yet, the calculation of evidence has remained a longstanding open problem in Gaussian graphical models. Currently, the only feasible solutions that exist are for special cases such as the Wishart or G-Wishart, in moderate dimensions. We develop an approach based on a novel telescoping block decomposition of the precision matrix that allows the estimation of evidence by application of Chib's technique under a very broad class of priors under mild requirements. Specifically, the requirements are: (a) the priors on the diagonal terms on the precision matrix can be written as gamma or scale mixtures of gamma random variables and (b) those on the off-diagonal terms can be represented as normal or scale mixtures of normal. This includes structured priors such as the Wishart or G-Wishart, and more recently introduced element-wise priors, such as the Bayesian graphical lasso and the graphical horseshoe. Among these, the true marginal is known in an analytically closed form for Wishart, providing a useful validation of our approach. For the general setting of the other three, and several more priors satisfying conditions (a) and (b) above, the calculation of evidence has remained an open question that this article resolves under a unifying framework.
△ Less
Submitted 30 August, 2024; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Inverse Probability Weighting: from Survey Sampling to Evidence Estimation
Authors:
Jyotishka Datta,
Nicholas Polson
Abstract:
We consider the class of inverse probability weight (IPW) estimators, including the popular Horvitz-Thompson and Hajek estimators used routinely in survey sampling, causal inference and evidence estimation for Bayesian computation. We focus on the 'weak paradoxes' for these estimators due to two counterexamples by Basu [1988] and Wasserman [2004] and investigate the two natural Bayesian answers to…
▽ More
We consider the class of inverse probability weight (IPW) estimators, including the popular Horvitz-Thompson and Hajek estimators used routinely in survey sampling, causal inference and evidence estimation for Bayesian computation. We focus on the 'weak paradoxes' for these estimators due to two counterexamples by Basu [1988] and Wasserman [2004] and investigate the two natural Bayesian answers to this problem: one based on binning and smoothing : a 'Bayesian sieve' and the other based on a conjugate hierarchical model that allows borrowing information via exchangeability. We compare the mean squared errors for the two Bayesian estimators with the IPW estimators for Wasserman's example via simulation studies on a broad range of parameter configurations. We also prove posterior consistency for the Bayes estimators under missing-completely-at-random assumption and show that it requires fewer assumptions on the inclusion probabilities. We also revisit the connection between the different problems where improved or adaptive IPW estimators will be useful, including survey sampling, evidence estimation strategies such as Conditional Monte Carlo, Riemannian sum, Trapezoidal rules and vertical likelihood, as well as average treatment effect estimation in causal inference.
△ Less
Submitted 13 April, 2025; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Merging Two Cultures: Deep and Statistical Learning
Authors:
Anindya Bhadra,
Jyotishka Datta,
Nick Polson,
Vadim Sokolov,
Jianeng Xu
Abstract:
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature…
▽ More
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Precision Matrix Estimation under the Horseshoe-like Prior-Penalty Dual
Authors:
Ksheera Sagar,
Sayantan Banerjee,
Jyotishka Datta,
Anindya Bhadra
Abstract:
Precision matrix estimation in a multivariate Gaussian model is fundamental to network estimation. Although there exist both Bayesian and frequentist approaches to this, it is difficult to obtain good Bayesian and frequentist properties under the same prior--penalty dual. To bridge this gap, our contribution is a novel prior--penalty dual that closely approximates the graphical horseshoe prior and…
▽ More
Precision matrix estimation in a multivariate Gaussian model is fundamental to network estimation. Although there exist both Bayesian and frequentist approaches to this, it is difficult to obtain good Bayesian and frequentist properties under the same prior--penalty dual. To bridge this gap, our contribution is a novel prior--penalty dual that closely approximates the graphical horseshoe prior and penalty, and performs well in both Bayesian and frequentist senses. A chief difficulty with the horseshoe prior is a lack of closed form expression of the density function, which we overcome in this article. In terms of theory, we establish posterior convergence rate of the precision matrix that matches the oracle rate, in addition to the frequentist consistency of the MAP estimator. In addition, our results also provide theoretical justifications for previously developed approaches that have been unexplored so far, e.g. for the graphical horseshoe prior. Computationally efficient EM and MCMC algorithms are developed respectively for the penalized likelihood and fully Bayesian estimation problems. In numerical experiments, the horseshoe-based approaches echo their superior theoretical properties by comprehensively outperforming the competing methods. A protein--protein interaction network estimation in B-cell lymphoma is considered to validate the proposed methodology.
△ Less
Submitted 18 January, 2022; v1 submitted 21 April, 2021;
originally announced April 2021.
-
On Posterior consistency of Bayesian Changepoint models
Authors:
Nilabja Guha,
Jyotishka Datta
Abstract:
While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the frequentist counterpart. We consider a hierarchical Bayesian linear model where the active set of covariates that affects the observations through a mean model can va…
▽ More
While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the frequentist counterpart. We consider a hierarchical Bayesian linear model where the active set of covariates that affects the observations through a mean model can vary between different time segments. Such structure may arise in social sciences/ economic sciences, such as sudden change of house price based on external economic factor, crime rate changes based on social and built-environment factors, and others. Using an appropriate adaptive prior, we outline the development of a hierarchical Bayesian methodology that can select the true change point as well as the true covariates, with high probability. We provide the first detailed theoretical analysis for posterior consistency with or without covariates, under suitable conditions. Gibbs sampling techniques provide an efficient computational strategy. We also consider small sample simulation study as well as application to crime forecasting applications.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Group Inverse-Gamma Gamma Shrinkage for Sparse Regression with Block-Correlated Predictors
Authors:
Jonathan Boss,
Jyotishka Datta,
Xin Wang,
Sung Kyun Park,
Jian Kang,
Bhramar Mukherjee
Abstract:
Heavy-tailed continuous shrinkage priors, such as the horseshoe prior, are widely used for sparse estimation problems. However, there is limited work extending these priors to predictors with grouping structures. Of particular interest in this article, is regression coefficient estimation where pockets of high collinearity in the covariate space are contained within known covariate groupings. To a…
▽ More
Heavy-tailed continuous shrinkage priors, such as the horseshoe prior, are widely used for sparse estimation problems. However, there is limited work extending these priors to predictors with grouping structures. Of particular interest in this article, is regression coefficient estimation where pockets of high collinearity in the covariate space are contained within known covariate groupings. To assuage variance inflation due to multicollinearity we propose the group inverse-gamma gamma (GIGG) prior, a heavy-tailed prior that can trade-off between local and group shrinkage in a data adaptive fashion. A special case of the GIGG prior is the group horseshoe prior, whose shrinkage profile is correlated within-group such that the regression coefficients marginally have exact horseshoe regularization. We show posterior consistency for regression coefficients in linear regression models and posterior concentration results for mean parameters in sparse normal means models. The full conditional distributions corresponding to GIGG regression can be derived in closed form, leading to straightforward posterior computation. We show that GIGG regression results in low mean-squared error across a wide range of correlation structures and within-group signal densities via simulation. We apply GIGG regression to data from the National Health and Nutrition Examination Survey for associating environmental exposures with liver functionality.
△ Less
Submitted 21 February, 2021;
originally announced February 2021.
-
FairMixRep : Self-supervised Robust Representation Learning for Heterogeneous Data with Fairness constraints
Authors:
Souradip Chakraborty,
Ekansh Verma,
Saswata Sahoo,
Jyotishka Datta
Abstract:
Representation Learning in a heterogeneous space with mixed variables of numerical and categorical types has interesting challenges due to its complex feature manifold. Moreover, feature learning in an unsupervised setup, without class labels and a suitable learning loss function, adds to the problem complexity. Further, the learned representation and subsequent predictions should not reflect disc…
▽ More
Representation Learning in a heterogeneous space with mixed variables of numerical and categorical types has interesting challenges due to its complex feature manifold. Moreover, feature learning in an unsupervised setup, without class labels and a suitable learning loss function, adds to the problem complexity. Further, the learned representation and subsequent predictions should not reflect discriminatory behavior towards certain sensitive groups or attributes. The proposed feature map should preserve maximum variations present in the data and needs to be fair with respect to the sensitive variables. We propose, in the first phase of our work, an efficient encoder-decoder framework to capture the mixed-domain information. The second phase of our work focuses on de-biasing the mixed space representations by adding relevant fairness constraints. This ensures minimal information loss between the representations before and after the fairness-preserving projections. Both the information content and the fairness aspect of the final representation learned has been validated through several metrics where it shows excellent performance. Our work (FairMixRep) addresses the problem of Mixed Space Fair Representation learning from an unsupervised perspective and learns a Universal representation that is timely, unique, and a novel research contribution.
△ Less
Submitted 14 October, 2020; v1 submitted 7 October, 2020;
originally announced October 2020.
-
Horseshoe Regularization for Machine Learning in Complex and Deep Models
Authors:
Anindya Bhadra,
Jyotishka Datta,
Yunfan Li,
Nicholas G. Polson
Abstract:
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian methodology in machine learning, specifically for high-dimensional regression and classification problems. They have achieved remarkable success in computation, and enjoy strong theoretical support. Most of the existing literature has focuse…
▽ More
Since the advent of the horseshoe priors for regularization, global-local shrinkage methods have proved to be a fertile ground for the development of Bayesian methodology in machine learning, specifically for high-dimensional regression and classification problems. They have achieved remarkable success in computation, and enjoy strong theoretical support. Most of the existing literature has focused on the linear Gaussian case; see Bhadra et al. (2019b) for a systematic survey. The purpose of the current article is to demonstrate that the horseshoe regularization is useful far more broadly, by reviewing both methodological and computational developments in complex models that are more relevant to machine learning applications. Specifically, we focus on methodological challenges in horseshoe regularization in nonlinear and non-Gaussian models; multivariate models; and deep neural networks. We also outline the recent computational developments in horseshoe shrinkage for complex models along with a list of available software implementations that allows one to venture out beyond the comfort zone of the canonical linear regression problems.
△ Less
Submitted 22 November, 2019; v1 submitted 24 April, 2019;
originally announced April 2019.
-
Joint Mean-Covariance Estimation via the Horseshoe with an Application in Genomic Data Analysis
Authors:
Yunfan Li,
Jyotishka Datta,
Bruce A. Craig,
Anindya Bhadra
Abstract:
Seemingly unrelated regression is a natural framework for regressing multiple correlated responses on multiple predictors. The model is very flexible, with multiple linear regression and covariance selection models being special cases. However, its practical deployment in genomic data analysis under a Bayesian framework is limited due to both statistical and computational challenges. The statistic…
▽ More
Seemingly unrelated regression is a natural framework for regressing multiple correlated responses on multiple predictors. The model is very flexible, with multiple linear regression and covariance selection models being special cases. However, its practical deployment in genomic data analysis under a Bayesian framework is limited due to both statistical and computational challenges. The statistical challenge is that one needs to infer both the mean vector and the inverse covariance matrix, a problem inherently more complex than separately estimating each. The computational challenge is due to the dimensionality of the parameter space that routinely exceeds the sample size. We propose the use of horseshoe priors on both the mean vector and the inverse covariance matrix. This prior has demonstrated excellent performance when estimating a mean vector or inverse covariance matrix separately. The current work shows these advantages are also present when addressing both simultaneously. A full Bayesian treatment is proposed, with a sampling algorithm that is linear in the number of predictors. MATLAB code implementing the algorithm is freely available from github at https://github.com/liyf1988/HS_GHS. Extensive performance comparisons are provided with both frequentist and Bayesian alternatives, and both estimation and prediction performances are verified on a genomic data set.
△ Less
Submitted 22 July, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Lasso Meets Horseshoe : A Survey
Authors:
Anindya Bhadra,
Jyotishka Datta,
Nicholas G. Polson,
Brandon T. Willard
Abstract:
The goal of this paper is to contrast and survey the major advances in two of the most commonly used high-dimensional techniques, namely, the Lasso and horseshoe regularization. Lasso is a gold standard for predictor selection while horseshoe is a state-of-the-art Bayesian estimator for sparse signals. Lasso is fast and scalable and uses convex optimization whilst the horseshoe is non-convex. Our…
▽ More
The goal of this paper is to contrast and survey the major advances in two of the most commonly used high-dimensional techniques, namely, the Lasso and horseshoe regularization. Lasso is a gold standard for predictor selection while horseshoe is a state-of-the-art Bayesian estimator for sparse signals. Lasso is fast and scalable and uses convex optimization whilst the horseshoe is non-convex. Our novel perspective focuses on three aspects: (i) theoretical optimality in high dimensional inference for the Gaussian sparse model and beyond, (ii) efficiency and scalability of computation and (iii) methodological development and performance.
△ Less
Submitted 3 March, 2019; v1 submitted 30 June, 2017;
originally announced June 2017.
-
Horseshoe Regularization for Feature Subset Selection
Authors:
Anindya Bhadra,
Jyotishka Datta,
Nicholas G. Polson,
Brandon Willard
Abstract:
Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_γ$ penalty for $γ\in (0,1)$, which r…
▽ More
Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_γ$ penalty for $γ\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers.
△ Less
Submitted 22 June, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
Inference on High-Dimensional Sparse Count Data
Authors:
Jyotishka Datta,
David B. Dunson
Abstract:
In a variety of application areas, there is a growing interest in analyzing high dimensional sparse count data, with sparsity exhibited by an over-abundance of zeros and small non-zero counts. Existing approaches for analyzing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to the level and nature of sparsity in the…
▽ More
In a variety of application areas, there is a growing interest in analyzing high dimensional sparse count data, with sparsity exhibited by an over-abundance of zeros and small non-zero counts. Existing approaches for analyzing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to the level and nature of sparsity in the data. We develop a new class of continuous local-global shrinkage priors tailored for sparse counts. Theoretical properties are assessed, including posterior concentration, stronger control on false discoveries in multiple testing, robustness in posterior mean and super-efficiency in estimating the sampling density. Simulation studies illustrate excellent small sample properties relative to competitors. We apply the method to detect rare mutational hotspots in exome sequencing data and to identify cities most impacted by terrorism.
△ Less
Submitted 14 April, 2016; v1 submitted 14 October, 2015;
originally announced October 2015.
-
Default Bayesian analysis with global-local shrinkage priors
Authors:
Anindya Bhadra,
Jyotishka Datta,
Nicholas G. Polson,
Brandon T. Willard
Abstract:
We provide a framework for assessing the default nature of a prior distribution using the property of regular variation, which we study for global-local shrinkage priors. In particular, we demonstrate the horseshoe priors, originally designed to handle sparsity, also possess regular variation and thus are appropriate for default Bayesian analysis. To illustrate our methodology, we solve a problem…
▽ More
We provide a framework for assessing the default nature of a prior distribution using the property of regular variation, which we study for global-local shrinkage priors. In particular, we demonstrate the horseshoe priors, originally designed to handle sparsity, also possess regular variation and thus are appropriate for default Bayesian analysis. To illustrate our methodology, we solve a problem of non-informative priors due to Efron (1973), who showed standard flat non-informative priors in high-dimensional normal means model can be highly informative for nonlinear parameters of interest. We consider four such problems and show global-local shrinkage priors such as the horseshoe and horseshoe+ perform as Efron (1973) requires in each case. We find the reason for this lies in the ability of the global-local shrinkage priors to separate a low-dimensional signal embedded in high-dimensional noise, even for nonlinear functions.
△ Less
Submitted 14 May, 2016; v1 submitted 12 October, 2015;
originally announced October 2015.