-
Adaptive finite element type decomposition of Gaussian processes
Authors:
Jaehoan Kim,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
In this paper, we investigate a class of approximate Gaussian processes (GP) obtained by taking a linear combination of compactly supported basis functions with the basis coefficients endowed with a dependent Gaussian prior distribution. This general class includes a popular approach that uses a finite element approximation of the stochastic partial differential equation (SPDE) associated with Mat…
▽ More
In this paper, we investigate a class of approximate Gaussian processes (GP) obtained by taking a linear combination of compactly supported basis functions with the basis coefficients endowed with a dependent Gaussian prior distribution. This general class includes a popular approach that uses a finite element approximation of the stochastic partial differential equation (SPDE) associated with Matérn GP. We explored another scalable alternative popularly used in the computer emulation literature where the basis coefficients at a lattice are drawn from a Gaussian process with an inverse-Gamma bandwidth. For both approaches, we study concentration rates of the posterior distribution. We demonstrated that the SPDE associated approach with a fixed smoothness parameter leads to a suboptimal rate despite how the number of basis functions and bandwidth are chosen when the underlying true function is sufficiently smooth. On the flip side, we showed that the later approach is rate-optimal adaptively over all smoothness levels of the underlying true function if an appropriate prior is placed on the number of basis functions. Efficient computational strategies are developed and numerics are provided to illustrate the theoretical results.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Scalable Efficient Inference in Complex Surveys through Targeted Resampling of Weights
Authors:
Snigdha Das,
Dipankar Bandyopadhyay,
Debdeep Pati
Abstract:
Survey data often arises from complex sampling designs, such as stratified or multistage sampling, with unequal inclusion probabilities. When sampling is informative, traditional inference methods yield biased estimators and poor coverage. Classical pseudo-likelihood based methods provide accurate asymptotic inference but lack finite-sample uncertainty quantification and the ability to integrate p…
▽ More
Survey data often arises from complex sampling designs, such as stratified or multistage sampling, with unequal inclusion probabilities. When sampling is informative, traditional inference methods yield biased estimators and poor coverage. Classical pseudo-likelihood based methods provide accurate asymptotic inference but lack finite-sample uncertainty quantification and the ability to integrate prior information. Existing Bayesian approaches, like the Bayesian pseudo-posterior estimator and weighted Bayesian bootstrap, have limitations; the former struggles with uncertainty quantification, while the latter is computationally intensive and sensitive to bootstrap replicates. To address these challenges, we propose the Survey-adjusted Weighted Likelihood Bootstrap (S-WLB), which resamples weights from a carefully chosen distribution centered around the underlying sampling weights. S-WLB is computationally efficient, theoretically consistent, and delivers finite-sample uncertainty intervals which are proven to be asymptotically valid. We demonstrate its performance through simulations and applications to nationally representative survey datasets like NHANES and NSDUH.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
A Generalized Tangent Approximation Framework for Strongly Super-Gaussian Likelihoods
Authors:
Somjit Roy,
Pritam Dey,
Debdeep Pati,
Bani K. Mallick
Abstract:
Tangent approximation form a popular class of variational inference (VI) techniques for Bayesian analysis in intractable non-conjugate models. It is based on the principle of convex duality to construct a minorant of the marginal likelihood, making the problem tractable. Despite its extensive applications, a general methodology for tangent approximation encompassing a large class of likelihoods be…
▽ More
Tangent approximation form a popular class of variational inference (VI) techniques for Bayesian analysis in intractable non-conjugate models. It is based on the principle of convex duality to construct a minorant of the marginal likelihood, making the problem tractable. Despite its extensive applications, a general methodology for tangent approximation encompassing a large class of likelihoods beyond logit models with provable optimality guarantees is still elusive. In this article, we propose a general Tangent Approximation based Variational InferencE (TAVIE) framework for strongly super-Gaussian (SSG) likelihood functions which includes a broad class of flexible probability models. Specifically, TAVIE obtains a quadratic lower bound of the corresponding log-likelihood, thus inducing conjugacy with Gaussian priors over the model parameters. Under mild assumptions on the data-generating process, we demonstrate the optimality of our proposed methodology in the fractional likelihood setup. Furthermore, we illustrate the empirical performance of TAVIE through extensive simulations and an application on the U.S. 2000 Census real data.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Robust Bayesian Inference on Riemannian Submanifold
Authors:
Rong Tang,
Anirban Bhattacharya,
Debdeep Pati,
Yun Yang
Abstract:
Non-Euclidean spaces routinely arise in modern statistical applications such as in medical imaging, robotics, and computer vision, to name a few. While traditional Bayesian approaches are applicable to such settings by considering an ambient Euclidean space as the parameter space, we demonstrate the benefits of integrating manifold structure into the Bayesian framework, both theoretically and comp…
▽ More
Non-Euclidean spaces routinely arise in modern statistical applications such as in medical imaging, robotics, and computer vision, to name a few. While traditional Bayesian approaches are applicable to such settings by considering an ambient Euclidean space as the parameter space, we demonstrate the benefits of integrating manifold structure into the Bayesian framework, both theoretically and computationally. Moreover, existing Bayesian approaches which are designed specifically for manifold-valued parameters are primarily model-based, which are typically subject to inaccurate uncertainty quantification under model misspecification. In this article, we propose a robust model-free Bayesian inference for parameters defined on a Riemannian submanifold, which is shown to provide valid uncertainty quantification from a frequentist perspective. Computationally, we propose a Markov chain Monte Carlo to sample from the posterior on the Riemannian submanifold, where the mixing time, in the large sample regime, is shown to depend only on the intrinsic dimension of the parameter space instead of the potentially much larger ambient dimension. Our numerical results demonstrate the effectiveness of our approach on a variety of problems, such as reduced-rank multiple quantile regression, principal component analysis, and Fréchet mean estimation.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Constrained Reweighting of Distributions: an Optimal Transport Approach
Authors:
Abhisek Chakraborty,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the f…
▽ More
We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
△ Less
Submitted 16 January, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors
Authors:
Prateek Jaiswal,
Debdeep Pati,
Anirban Bhattacharya,
Bani K. Mallick
Abstract:
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS…
▽ More
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve stochastic multi-armed bandit problems. We consider a variant of TS, named $α$-TS, where we use a fractional or $α$-posterior ($α\in(0,1)$) instead of the standard posterior distribution. To compute an $α$-posterior, the likelihood in the definition of the standard posterior is tempered with a factor $α$. For $α$-TS we obtain both instance-dependent $\mathcal{O}\left(\sum_{k \neq i^*} Δ_k\left(\frac{\log(T)}{C(α)Δ_k^2} + \frac{1}{2} \right)\right)$ and instance-independent $\mathcal{O}(\sqrt{KT\log K})$ frequentist regret bounds under very mild conditions on the prior and reward distributions, where $Δ_k$ is the gap between the true mean rewards of the $k^{th}$ and the best arms, and $C(α)$ is a known constant. Both the sub-Gaussian and exponential family models satisfy our general conditions on the reward distribution. Our conditions on the prior distribution just require its density to be positive, continuous, and bounded. We also establish another instance-dependent regret upper bound that matches (up to constants) to that of improved UCB [Auer and Ortner, 2010]. Our regret analysis carefully combines recent theoretical developments in the non-asymptotic concentration analysis and Bernstein-von Mises type results for the $α$-posterior distribution. Moreover, our analysis does not require additional structural properties such as closed-form posteriors or conjugate priors.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Covariate-Assisted Bayesian Graph Learning for Heterogeneous Data
Authors:
Yabo Niu,
Yang Ni,
Debdeep Pati,
Bani K. Mallick
Abstract:
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous…
▽ More
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussian-conditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any $ν$-Hölder conditional variance-covariance matrices with $ν\in(0,1]$. We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Memory Efficient And Minimax Distribution Estimation Under Wasserstein Distance Using Bayesian Histograms
Authors:
Peter Matthew Jacobs,
Lekha Patel,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the poste…
▽ More
We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the posterior mean histogram and with respect to posterior contraction: under the class of Borel probability measures and some classes of smooth densities. The attained memory footprint overcomes existing minimax optimal procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class. Additionally constructing both the posterior mean histogram and the posterior itself can be done super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the coverage provided by the $d < 2v$ case, our results are of most practical interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide simulations demonstrating the theory in several of these instances.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
On the Convergence of Coordinate Ascent Variational Inference
Authors:
Anirban Bhattacharya,
Debdeep Pati,
Yun Yang
Abstract:
As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation un…
▽ More
As a computational alternative to Markov chain Monte Carlo approaches, variational inference (VI) is becoming more and more popular for approximating intractable posterior distributions in large-scale Bayesian models due to its comparable efficacy and superior efficiency. Several recent works provide theoretical justifications of VI by proving its statistical optimality for parameter estimation under various settings; meanwhile, formal analysis on the algorithmic convergence aspects of VI is still largely lacking. In this paper, we consider the common coordinate ascent variational inference (CAVI) algorithm for implementing the mean-field (MF) VI towards optimizing a Kullback--Leibler divergence objective functional over the space of all factorized distributions. Focusing on the two-block case, we analyze the convergence of CAVI by leveraging the extensive toolbox from functional analysis and optimization. We provide general conditions for certifying global or local exponential convergence of CAVI. Specifically, a new notion of generalized correlation for characterizing the interaction between the constituting blocks in influencing the VI objective functional is introduced, which according to the theory, quantifies the algorithmic contraction rate of two-block CAVI. As illustrations, we apply the developed theory to a number of examples, and derive explicit problem-dependent upper bounds on the algorithmic contraction rate.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Fair Clustering via Hierarchical Fair-Dirichlet Process
Authors:
Abhisek Chakraborty,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a prot…
▽ More
The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Blocked Gibbs sampler for hierarchical Dirichlet processes
Authors:
Snigdha Das,
Yabo Niu,
Yang Ni,
Bani K. Mallick,
Debdeep Pati
Abstract:
Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and has a quadratic complexity with the sample size. A recently developed sli…
▽ More
Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and has a quadratic complexity with the sample size. A recently developed slice sampler allows for efficient blocked updates of the parameters, but is shown to be statistically unstable in our article. We develop a blocked Gibbs sampler that employs a truncated approximation of the underlying random measures to sample from the posterior distribution of HDP, which produces statistically stable results, is highly scalable with respect to sample size, and is shown to have good mixing. The heart of the construction is to endow the shared concentration parameter with an appropriately chosen gamma prior that allows us to break the dependence of the shared mixing proportions and permits independent updates of certain log-concave random variables in a block. En route, we develop an efficient rejection sampler for these random variables leveraging piece-wise tangent-line approximations.
△ Less
Submitted 4 August, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Robust probabilistic inference via a constrained transport metric
Authors:
Abhisek Chakraborty,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein me…
▽ More
Flexible Bayesian models are typically constructed using limits of large parametric models with a multitude of parameters that are often uninterpretable. In this article, we offer a novel alternative by constructing an exponentially tilted empirical likelihood carefully designed to concentrate near a parametric family of distributions of choice with respect to a novel variant of the Wasserstein metric, which is then combined with a prior distribution on model parameters to obtain a robustified posterior. The proposed approach finds applications in a wide variety of robust inference problems, where we intend to perform inference on the parameters associated with the centering distribution in presence of outliers. Our proposed transport metric enjoys great computational simplicity, exploiting the Sinkhorn regularization for discrete optimal transport problems, and being inherently parallelizable. We demonstrate superior performance of our methodology when compared against state-of-the-art robust Bayesian inference methods. We also demonstrate equivalence of our approach with a nonparametric Bayesian formulation under a suitable asymptotic framework, testifying to its flexibility. The constrained entropy maximization that sits at the heart of our likelihood formulation finds its utility beyond robust Bayesian inference; an illustration is provided in a trustworthy machine learning application.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling
Authors:
Sutanoy Dasgupta,
Peng Zhao,
Jacob Helwig,
Prasenjit Ghosh,
Debdeep Pati,
Bani K. Mallick
Abstract:
Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a prod…
▽ More
Gaussian graphical models typically assume a homogeneous structure across all subjects, which is often restrictive in applications. In this article, we propose a weighted pseudo-likelihood approach for graphical modeling which allows different subjects to have different graphical structures depending on extraneous covariates. The pseudo-likelihood approach replaces the joint distribution by a product of the conditional distributions of each variable. We cast the conditional distribution as a heteroscedastic regression problem, with covariate-dependent variance terms, to enable information borrowing directly from the data instead of a hierarchical framework. This allows independent graphical modeling for each subject, while retaining the benefits of a hierarchical Bayes model and being computationally tractable. An efficient embarrassingly parallel variational algorithm is developed to approximate the posterior and obtain estimates of the graphs. Using a fractional variational framework, we derive asymptotic risk bounds for the estimate in terms of a novel variant of the $α$-Rényi divergence. We theoretically demonstrate the advantages of information borrowing across covariates over independent modeling. We show the practical advantages of the approach through simulation studies and illustrate the dependence structure in protein expression levels on breast cancer patients using CNV information as covariates.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Factorized Fusion Shrinkage for Dynamic Relational Data
Authors:
Peng Zhao,
Anirban Bhattacharya,
Debdeep Pati,
Bani K. Mallick
Abstract:
Modern data science applications often involve complex relational data with dynamic structures. An abrupt change in such dynamic relational data is typically observed in systems that undergo regime changes due to interventions. In such a case, we consider a factorized fusion shrinkage model in which all decomposed factors are dynamically shrunk towards group-wise fusion structures, where the shrin…
▽ More
Modern data science applications often involve complex relational data with dynamic structures. An abrupt change in such dynamic relational data is typically observed in systems that undergo regime changes due to interventions. In such a case, we consider a factorized fusion shrinkage model in which all decomposed factors are dynamically shrunk towards group-wise fusion structures, where the shrinkage is obtained by applying global-local shrinkage priors to the successive differences of the row vectors of the factorized matrices. The proposed priors enjoy many favorable properties in comparison and clustering of the estimated dynamic latent factors. Comparing estimated latent factors involves both adjacent and long-term comparisons, with the time range of comparison considered as a variable. Under certain conditions, we demonstrate that the posterior distribution attains the minimax optimal rate up to logarithmic factors. In terms of computation, we present a structured mean-field variational inference framework that balances optimal posterior inference with computational scalability, exploiting both the dependence among components and across time. The framework can accommodate a wide variety of models, including dynamic matrix factorization, latent space models for networks and low-rank tensors. The effectiveness of our methodology is demonstrated through extensive simulations and real-world data analysis.
△ Less
Submitted 12 July, 2024; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Structured Optimal Variational Inference for Dynamic Latent Space Models
Authors:
Peng Zhao,
Anirban Bhattacharya,
Debdeep Pati,
Bani K. Mallick
Abstract:
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions. To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework, where the time-dependent properties of the dynamic networks are exploited to facilitate computation and inf…
▽ More
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions. To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework, where the time-dependent properties of the dynamic networks are exploited to facilitate computation and inference. Additionally, an easy-to-implement block coordinate ascent algorithm is developed with message-passing type updates in each block, whereas the complexity per iteration is linear with the number of nodes and time points. To certify the optimality, we demonstrate that the variational risk of the proposed variational inference approach attains the minimax optimal rate with only a logarithm factor under certain conditions. To this end, we first derive the minimax lower bound, which might be of independent interest. In addition, we show that the posterior under commonly adopted Gaussian random walk priors can achieve the minimax lower bound with only a logarithm factor. To the best of our knowledge, this is the first such a throughout theoretical analysis of Bayesian dynamic latent space models. Simulations and real data analysis demonstrate the efficacy of our methodology and the efficiency of our algorithm.
△ Less
Submitted 15 October, 2024; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Off-Policy Evaluation Using Information Borrowing and Context-Based Switching
Authors:
Sutanoy Dasgupta,
Yabo Niu,
Kishan Panaganti,
Dileep Kalathil,
Debdeep Pati,
Bani Mallick
Abstract:
We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algori…
▽ More
We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also adaptively interpolates between this modified DM estimator and a modified DR estimator based on a context-specific switching rule. We give provable guarantees on the performance of the DR-IC estimator. We also demonstrate the superior performance of the DR-IC estimator compared to the state-of-the-art OPE algorithms on a number of benchmark problems.
△ Less
Submitted 18 August, 2024; v1 submitted 18 December, 2021;
originally announced December 2021.
-
Adaptive posterior convergence in sparse high dimensional clipped generalized linear models
Authors:
Biraj Subhra Guha,
Debdeep Pati
Abstract:
We develop a framework to study posterior contraction rates in sparse high dimensional generalized linear models (GLM). We introduce a new family of GLMs, denoted by clipped GLM, which subsumes many standard GLMs and makes minor modification of the rest. With a sparsity inducing prior on the regression coefficients, we delineate sufficient conditions on true data generating density that leads to m…
▽ More
We develop a framework to study posterior contraction rates in sparse high dimensional generalized linear models (GLM). We introduce a new family of GLMs, denoted by clipped GLM, which subsumes many standard GLMs and makes minor modification of the rest. With a sparsity inducing prior on the regression coefficients, we delineate sufficient conditions on true data generating density that leads to minimax optimal rates of posterior contraction of the coefficients in $\ell_1$ norm. Our key contribution is to develop sufficient conditions commensurate with the geometry of the clipped GLM family, propose prior distributions which do not require any knowledge of the true parameters and avoid any assumption on the growth rate of the true coefficient vector.
△ Less
Submitted 14 March, 2021;
originally announced March 2021.
-
A Hybrid Approximation to the Marginal Likelihood
Authors:
Eric Chuu,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
Computing the marginal likelihood or evidence is one of the core challenges in Bayesian analysis. While there are many established methods for estimating this quantity, they predominantly rely on using a large number of posterior samples obtained from a Markov Chain Monte Carlo (MCMC) algorithm. As the dimension of the parameter space increases, however, many of these methods become prohibitively…
▽ More
Computing the marginal likelihood or evidence is one of the core challenges in Bayesian analysis. While there are many established methods for estimating this quantity, they predominantly rely on using a large number of posterior samples obtained from a Markov Chain Monte Carlo (MCMC) algorithm. As the dimension of the parameter space increases, however, many of these methods become prohibitively slow and potentially inaccurate. In this paper, we propose a novel method in which we use the MCMC samples to learn a high probability partition of the parameter space and then form a deterministic approximation over each of these partition sets. This two-step procedure, which constitutes both a probabilistic and a deterministic component, is termed a Hybrid approximation to the marginal likelihood. We demonstrate its versatility in a plethora of examples with varying dimension and sample size, and we also highlight the Hybrid approximation's effectiveness in situations where there is either a limited number or only approximate MCMC samples available.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Statistical Guarantees for Transformation Based Models with Applications to Implicit Variational Inference
Authors:
Sean Plummer,
Shuang Zhou,
Anirban Bhattacharya,
David Dunson,
Debdeep Pati
Abstract:
Transformation-based methods have been an attractive approach in non-parametric inference for problems such as unconditional and conditional density estimation due to their unique hierarchical structure that models the data as flexible transformation of a set of common latent variables. More recently, transformation-based models have been used in variational inference (VI) to construct flexible im…
▽ More
Transformation-based methods have been an attractive approach in non-parametric inference for problems such as unconditional and conditional density estimation due to their unique hierarchical structure that models the data as flexible transformation of a set of common latent variables. More recently, transformation-based models have been used in variational inference (VI) to construct flexible implicit families of variational distributions. However, their use in both non-parametric inference and variational inference lacks theoretical justification. We provide theoretical justification for the use of non-linear latent variable models (NL-LVMs) in non-parametric inference by showing that the support of the transformation induced prior in the space of densities is sufficiently large in the $L_1$ sense. We also show that, when a Gaussian process (GP) prior is placed on the transformation function, the posterior concentrates at the optimal rate up to a logarithmic factor. Adopting the flexibility demonstrated in the non-parametric setting, we use the NL-LVM to construct an implicit family of variational distributions, deemed GP-IVI. We delineate sufficient conditions under which GP-IVI achieves optimal risk bounds and approximates the true posterior in the sense of the Kullback-Leibler divergence. To the best of our knowledge, this is the first work on providing theoretical guarantees for implicit variational inference.
△ Less
Submitted 4 November, 2020; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Statistical optimality and stability of tangent transform algorithms in logit models
Authors:
Indrajit Ghosh,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algo…
▽ More
A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Statistical Guarantees and Algorithmic Convergence Issues of Variational Boosting
Authors:
Biraj Subhra Guha,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
We provide statistical guarantees for Bayesian variational boosting by proposing a novel small bandwidth Gaussian mixture variational family. We employ a functional version of Frank-Wolfe optimization as our variational algorithm and study frequentist properties of the iterative boosting updates. Comparisons are drawn to the recent literature on boosting, describing how the choice of the variation…
▽ More
We provide statistical guarantees for Bayesian variational boosting by proposing a novel small bandwidth Gaussian mixture variational family. We employ a functional version of Frank-Wolfe optimization as our variational algorithm and study frequentist properties of the iterative boosting updates. Comparisons are drawn to the recent literature on boosting, describing how the choice of the variational family and the discrepancy measure affect both convergence and finite-sample statistical properties of the optimization routine. Specifically, we first demonstrate stochastic boundedness of the boosting iterates with respect to the data generating distribution. We next integrate this within our algorithm to provide an explicit convergence rate, ending with a result on the required number of boosting updates.
△ Less
Submitted 21 October, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Evidence bounds in singular models: probabilistic and variational perspectives
Authors:
Anirban Bhattacharya,
Debdeep Pati,
Sean Plummer
Abstract:
The marginal likelihood or evidence in Bayesian statistics contains an intrinsic penalty for larger model sizes and is a fundamental quantity in Bayesian model comparison. Over the past two decades, there has been steadily increasing activity to understand the nature of this penalty in singular statistical models, building on pioneering work by Sumio Watanabe. Unlike regular models where the Bayes…
▽ More
The marginal likelihood or evidence in Bayesian statistics contains an intrinsic penalty for larger model sizes and is a fundamental quantity in Bayesian model comparison. Over the past two decades, there has been steadily increasing activity to understand the nature of this penalty in singular statistical models, building on pioneering work by Sumio Watanabe. Unlike regular models where the Bayesian information criterion (BIC) encapsulates a first-order expansion of the logarithm of the marginal likelihood, parameter counting gets trickier in singular models where a quantity called the real log canonical threshold (RLCT) summarizes the effective model dimensionality. In this article, we offer a probabilistic treatment to recover non-asymptotic versions of established evidence bounds as well as prove a new result based on the Gibbs variational inequality. In particular, we show that mean-field variational inference correctly recovers the RLCT for any singular model in its canonical or normal form. We additionally exhibit sharpness of our bound by analyzing the dynamics of a general purpose coordinate ascent algorithm (CAVI) popularly employed in variational inference.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Dynamics of coordinate ascent variational inference: A case study in 2D Ising models
Authors:
Sean Plummer,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequentia…
▽ More
Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting {\em discordances} between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we {\em empirically} show that a parameter expansion of the Ising model, popularly called as the Edward--Sokal coupling, leads to an enlargement of the regime of convergence to the global optima.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Radius and equation of state constraints from massive neutron stars and GW190814
Authors:
Yeunhwan Lim,
Anirban Bhattacharya,
Jeremy W. Holt,
Debdeep Pati
Abstract:
Motivated by the unknown nature of the $2.50-2.67\,M_\odot$ compact object in the binary merger event GW190814, we study the maximum neutron star mass based on constraints from low-energy nuclear physics, neutron star tidal deformabilities from GW170817, and simultaneous mass-radius measurements of PSR J0030+045 from NICER. Our prior distribution is based on a combination of nuclear modeling valid…
▽ More
Motivated by the unknown nature of the $2.50-2.67\,M_\odot$ compact object in the binary merger event GW190814, we study the maximum neutron star mass based on constraints from low-energy nuclear physics, neutron star tidal deformabilities from GW170817, and simultaneous mass-radius measurements of PSR J0030+045 from NICER. Our prior distribution is based on a combination of nuclear modeling valid in the vicinity of normal nuclear densities together with the assumption of a maximally stiff equation of state at high densities, a choice that enables us to probe the connection between observed heavy neutron stars and the transition density at which conventional nuclear physics models must break down. We demonstrate that a modification of the highly uncertain supra-saturation density equation of state beyond 2.64 times normal nuclear density is required in order for chiral effective field theory models to be consistent with current neutron star observations and the existence of $2.6\,M_\odot$ neutron stars. We also show that the existence of very massive neutron stars strongly impacts the radii of $\sim 2.0\,M_\odot$ neutron stars (but not necessarily the radii of $1.4\,M_\odot$ neutron stars), which further motivates future NICER radius measurements of PSR J1614-2230 and PSR J0740+6620.
△ Less
Submitted 5 April, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Tail-adaptive Bayesian shrinkage
Authors:
Se Yoon Lee,
Peng Zhao,
Debdeep Pati,
Bani K. Mallick
Abstract:
Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the so-called ultra-sparsity domain. However, they may not perform desirably when the degree of sparsity is moderate. In this paper, we propose a robust sparse estimation…
▽ More
Robust Bayesian methods for high-dimensional regression problems under diverse sparse regimes are studied. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens of thousands of predictors in the so-called ultra-sparsity domain. However, they may not perform desirably when the degree of sparsity is moderate. In this paper, we propose a robust sparse estimation method under diverse sparsity regimes, which has a tail-adaptive shrinkage property. In this property, the tail-heaviness of the prior adjusts adaptively, becoming larger or smaller as the sparsity level increases or decreases, respectively, to accommodate more or fewer signals, a posteriori. We propose a global-local-tail (GLT) Gaussian mixture distribution that ensures this property. We examine the role of the tail-index of the prior in relation to the underlying sparsity level and demonstrate that the GLT posterior contracts at the minimax optimal rate for sparse normal mean models. We apply both the GLT prior and the Horseshoe prior to a real data problem and simulation examples. Our findings indicate that the varying tail rule based on the GLT prior offers advantages over a fixed tail rule based on the Horseshoe prior in diverse sparsity regimes.
△ Less
Submitted 24 October, 2024; v1 submitted 4 July, 2020;
originally announced July 2020.
-
Nonasymptotic Laplace approximation under model misspecification
Authors:
Anirban Bhattacharya,
Debdeep Pati
Abstract:
We present non-asymptotic two-sided bounds to the log-marginal likelihood in Bayesian inference. The classical Laplace approximation is recovered as the leading term. Our derivation permits model misspecification and allows the parameter dimension to grow with the sample size. We do not make any assumptions about the asymptotic shape of the posterior, and instead require certain regularity conditi…
▽ More
We present non-asymptotic two-sided bounds to the log-marginal likelihood in Bayesian inference. The classical Laplace approximation is recovered as the leading term. Our derivation permits model misspecification and allows the parameter dimension to grow with the sample size. We do not make any assumptions about the asymptotic shape of the posterior, and instead require certain regularity conditions on the likelihood ratio and that the posterior to be sufficiently concentrated.
△ Less
Submitted 20 June, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Mass-shifting phenomenon of truncated multivariate normal priors
Authors:
Shuang Zhou,
Pallavi Ray,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
We show that lower-dimensional marginal densities of dependent zero-mean normal distributions truncated to the positive orthant exhibit a mass-shifting phenomenon. Despite the truncated multivariate normal density having a mode at the origin, the marginal density assigns increasingly small mass near the origin as the dimension increases. The phenomenon accentuates with stronger correlation between…
▽ More
We show that lower-dimensional marginal densities of dependent zero-mean normal distributions truncated to the positive orthant exhibit a mass-shifting phenomenon. Despite the truncated multivariate normal density having a mode at the origin, the marginal density assigns increasingly small mass near the origin as the dimension increases. The phenomenon accentuates with stronger correlation between the random variables. A precise quantification characterizing the role of the dimension as well as the dependence is provided. This surprising behavior has serious implications towards Bayesian constrained estimation and inference, where the prior, in addition to having a full support, is required to assign a substantial probability near the origin to capture at parts of the true function of interest. Without further modification, we show that truncated normal priors are not suitable for modeling at regions and propose a novel alternative strategy based on shrinking the coordinates using a multiplicative scale parameter. The proposed shrinkage prior is empirically shown to guard against the mass shifting phenomenon while retaining computational efficiency.
△ Less
Submitted 18 May, 2020; v1 submitted 25 January, 2020;
originally announced January 2020.
-
Bayesian Copula Density Deconvolution for Zero-Inflated Data in Nutritional Epidemiology
Authors:
Abhra Sarkar,
Debdeep Pati,
Bani K. Mallick,
Raymond J. Carroll
Abstract:
Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recal…
▽ More
Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating the density of the latent long-time intakes from their observed measurement error contaminated proxies is then a problem of deconvolution of densities with zero-inflated data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent variable framework that translates the problem to one involving continuous surrogates only. Crucial to accommodating important aspects of the problem, we then design a copula-based approach to model the involved joint distributions, adopting different modeling strategies for the marginals of the different dietary components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional epidemiology problems, compared to other approaches, our method provides more realistic estimates of the consumption patterns of episodically consumed dietary components.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Gaussian Processes with Errors in Variables: Theory and Computation
Authors:
Shuang Zhou,
Debdeep Pati,
Tianying Wang,
Yun Yang,
Raymond J. Carroll
Abstract:
Covariate measurement error in nonparametric regression is a common problem in nutritional epidemiology and geostatistics, and other fields. Over the last two decades, this problem has received substantial attention in the frequentist literature. Bayesian approaches for handling measurement error have only been explored recently and are surprisingly successful, although the lack of a proper theore…
▽ More
Covariate measurement error in nonparametric regression is a common problem in nutritional epidemiology and geostatistics, and other fields. Over the last two decades, this problem has received substantial attention in the frequentist literature. Bayesian approaches for handling measurement error have only been explored recently and are surprisingly successful, although the lack of a proper theoretical justification regarding the asymptotic performance of the estimators. By specifying a Gaussian process prior on the regression function and a Dirichlet process Gaussian mixture prior on the unknown distribution of the unobserved covariates, we show that the posterior distribution of the regression function and the unknown covariates density attain optimal rates of contraction adaptively over a range of Hölder classes, up to logarithmic terms. This improves upon the existing classical frequentist results which require knowledge of the smoothness of the underlying function to deliver optimal risk bounds. We also develop a novel surrogate prior for approximating the Gaussian process prior that leads to efficient computation and preserves the covariance structure, thereby facilitating easy prior elicitation. We demonstrate the empirical performance of our approach and compare it with competitors in a wide range of simulation experiments and a real data example.
△ Less
Submitted 26 January, 2023; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Impact of Inference Accelerators on hardware selection
Authors:
Dibyajyoti Pati,
Caroline Favart,
Purujit Bahl,
Vivek Soni,
Yun-chan Tsai,
Michael Potter,
Jiahui Guan,
Xiaomeng Dong,
V. Ratna Saripalli
Abstract:
As opportunities for AI-assisted healthcare grow steadily, model deployment faces challenges due to the specific characteristics of the industry. The configuration choice for a production device can impact model performance while influencing operational costs. Moreover, in healthcare some situations might require fast, but not real time, inference. We study different configurations and conduct a c…
▽ More
As opportunities for AI-assisted healthcare grow steadily, model deployment faces challenges due to the specific characteristics of the industry. The configuration choice for a production device can impact model performance while influencing operational costs. Moreover, in healthcare some situations might require fast, but not real time, inference. We study different configurations and conduct a cost-performance analysis to determine the optimized hardware for the deployment of a model subject to healthcare domain constraints. We observe that a naive performance comparison may not lead to an optimal configuration selection. In fact, given realistic domain constraints, CPU execution might be preferable to GPU accelerators. Hence, defining beforehand precise expectations for model deployment is crucial.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
AI Assisted Annotator using Reinforcement Learning
Authors:
V. Ratna Saripalli,
Gopal Avinash,
Dibyajyoti Pati,
Michael Potter,
Charles W. Anderson
Abstract:
Healthcare data suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated in healthcare. Unlike other data sets, medical data annotation, which is critical to accurate ground truth, requires medical domain expertise for a better patient outcome. In this work, we report on the use of reinforcement learning to mimic the decision making process of ann…
▽ More
Healthcare data suffers from both noise and lack of ground truth. The cost of data increases as it is cleaned and annotated in healthcare. Unlike other data sets, medical data annotation, which is critical to accurate ground truth, requires medical domain expertise for a better patient outcome. In this work, we report on the use of reinforcement learning to mimic the decision making process of annotators for medical events, to automate annotation and labelling. The reinforcement agent learns to annotate alarm data based on annotations done by an expert. Our method shows promising results on medical alarm data sets. We trained DQN and A2C agents using the data from monitoring devices annotated by an expert. Initial results from these RL agents learning the expert annotation behavior are promising. The A2C agent performs better in terms of learning the sparse events in a given state, thereby choosing more right actions compared to DQN agent. To the best of our knowledge, this is the first reinforcement learning application for the automation of medical events annotation, which has far-reaching practical use.
△ Less
Submitted 11 June, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Efficient Bayesian shape-restricted function estimation with constrained Gaussian process priors
Authors:
Pallavi Ray,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
This article revisits the problem of Bayesian shape-restricted inference in the light of a recently developed approximate Gaussian process that admits an equivalent formulation of the shape constraints in terms of the basis coefficients. We propose a strategy to efficiently sample from the resulting constrained posterior by absorbing a smooth relaxation of the constraint in the likelihood and usin…
▽ More
This article revisits the problem of Bayesian shape-restricted inference in the light of a recently developed approximate Gaussian process that admits an equivalent formulation of the shape constraints in terms of the basis coefficients. We propose a strategy to efficiently sample from the resulting constrained posterior by absorbing a smooth relaxation of the constraint in the likelihood and using circulant embedding techniques to sample from the unconstrained modified prior. We additionally pay careful attention to mitigate the computational complexity arising from updating hyperparameters within the covariance kernel of the Gaussian process. The developed algorithm is shown to be accurate and highly efficient in simulated and real data examples.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Bayesian Graph Selection Consistency Under Model Misspecification
Authors:
Yabo Niu,
Debdeep Pati,
Bani Mallick
Abstract:
Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph and characterize uncertainty in the selection. For scalability of the Markov chain Monte Carlo algorithms, decomposability is co…
▽ More
Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph and characterize uncertainty in the selection. For scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space. A wide variety of graphical conjugate priors are proposed jointly on the covariance matrix and the graph with improved algorithms to search along the space of decomposable graphs, rendering the methods extremely popular in the context of multivariate dependence modeling. {\it An open problem} in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that it is ``close'' in an appropriate sense to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph which results in an affirmative answer to this question using a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space, both in the well-specified and misspecified settings. In absence of structural sparsity assumptions, our strong selection consistency holds in a high dimensional setting where $p = O(n^α)$ for $α< 1/3$. We show when the true graph is non-decomposable, the posterior distribution on the graph concentrates on a set of graphs that are {\it minimal triangulations} of the true graph.
△ Less
Submitted 31 March, 2019; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Bayesian Hierarchical Modeling on Covariance Valued Data
Authors:
Satwik Acharyya,
Zhengwu Zhang,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
Analysis of structural and functional connectivity (FC) of human brains is of pivotal importance for diagnosis of cognitive ability. The Human Connectome Project (HCP) provides an excellent source of neural data across different regions of interest (ROIs) of the living human brain. Individual specific data were available from an existing analysis (Dai et al., 2017) in the form of time varying cova…
▽ More
Analysis of structural and functional connectivity (FC) of human brains is of pivotal importance for diagnosis of cognitive ability. The Human Connectome Project (HCP) provides an excellent source of neural data across different regions of interest (ROIs) of the living human brain. Individual specific data were available from an existing analysis (Dai et al., 2017) in the form of time varying covariance matrices representing the brain activity as the subjects perform a specific task. As a preliminary objective of studying the heterogeneity of brain connectomics across the population, we develop a probabilistic model for a sample of covariance matrices using a scaled Wishart distribution. We stress here that our data units are available in the form of covariance matrices, and we use the Wishart distribution to create our likelihood function rather than its more common usage as a prior on covariance matrices. Based on empirical explorations suggesting the data matrices to have low effective rank, we further model the center of the Wishart distribution using an orthogonal factor model type decomposition. We encourage shrinkage towards a low rank structure through a novel shrinkage prior and discuss strategies to sample from the posterior distribution using a combination of Gibbs and slice sampling. We extend our modeling framework to a dynamic setting to detect change points. The efficacy of the approach is explored in various simulation settings and exemplified on several case studies including our motivating HCP data. We extend our modeling framework to a dynamic setting to detect change points.
△ Less
Submitted 9 July, 2020; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Revisiting the proton-radius problem using constrained Gaussian processes
Authors:
Shuang Zhou,
P. Giuliani,
J. Piekarewicz,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
Background: The "proton radius puzzle" refers to an eight-year old problem that highlights major inconsistencies in the extraction of the charge radius of the proton from muonic Lamb-shift experiments as compared against experiments using elastic electron scattering. For the latter, the determination of the charge radius involves an extrapolation of the experimental form factor to zero momentum tr…
▽ More
Background: The "proton radius puzzle" refers to an eight-year old problem that highlights major inconsistencies in the extraction of the charge radius of the proton from muonic Lamb-shift experiments as compared against experiments using elastic electron scattering. For the latter, the determination of the charge radius involves an extrapolation of the experimental form factor to zero momentum transfer.
Purpose: To estimate the proton radius by introducing a novel non-parametric approach to model the electric form factor of the proton.
Methods: Within a Bayesian paradigm, we develop a model flexible enough to fit the data without any parametric assumptions on the form factor. The Bayesian estimation is guided by imposing only two physical constraints on the form factor: (a) its value at zero momentum transfer (normalization) and (b) its overall shape, assumed to be a monotonically decreasing function of the momentum transfer. Variants of these assumptions are explored to assess the impact of these constraints.
Results: So far our results are inconclusive in regard to the proton puzzle, as they depend on both, the assumed constrains and the range of experimental data used. For example, if only low momentum-transfer data is used, adopting only the normalization constraint provides a value compatible with the smaller muonic result, while imposing only the shape constraint favors the larger electronic value.
Conclusions: We have presented a novel technique to estimate the proton radius from electron scattering data based on a non-parametric Gaussian process. We have shown the impact of the physical constraints imposed on the form factor and of the range of experimental data used. In this regard, we are hopeful that as this technique is refined and with the anticipated new results from the PRad experiment, we will get closer to resolve of the puzzle.
△ Less
Submitted 17 August, 2018;
originally announced August 2018.
-
The Soft Multivariate Truncated Normal Distribution with Applications to Bayesian Constrained Estimation
Authors:
Allyson Souris,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
We propose a new distribution, called the soft tMVN distribution, which provides a smooth approximation to the truncated multivariate normal (tMVN) distribution with linear constraints. An efficient blocked Gibbs sampler is developed to sample from the soft tMVN distribution in high dimensions. We provide theoretical support to the approximation capability of the soft tMVN and provide further empi…
▽ More
We propose a new distribution, called the soft tMVN distribution, which provides a smooth approximation to the truncated multivariate normal (tMVN) distribution with linear constraints. An efficient blocked Gibbs sampler is developed to sample from the soft tMVN distribution in high dimensions. We provide theoretical support to the approximation capability of the soft tMVN and provide further empirical evidence thereof. The soft tMVN distribution can be used to approximate simulations from a multivariate truncated normal distribution with linear constraints, or itself as a prior in shape-constrained problems.
△ Less
Submitted 2 September, 2019; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Shape-Constrained Univariate Density Estimation
Authors:
Sutanoy Dasgupta,
Debdeep Pati,
Ian H. Jermyn,
Anuj Srivastava
Abstract:
While the problem of estimating a probability density function (pdf) from its observations is classical, the estimation under additional shape constraints is both important and challenging. We introduce an efficient, geometric approach for estimating pdfs given the number of its modes. This approach explores the space of constrained pdf's using an action of the diffeomorphism group that preserves…
▽ More
While the problem of estimating a probability density function (pdf) from its observations is classical, the estimation under additional shape constraints is both important and challenging. We introduce an efficient, geometric approach for estimating pdfs given the number of its modes. This approach explores the space of constrained pdf's using an action of the diffeomorphism group that preserves their shapes. It starts with an initial template, with the desired number of modes and arbitrarily chosen heights at the critical points, and transforms it via: (1) composition by diffeomorphisms and (2) normalization to obtain the final density estimate. The search for optimal diffeomorphism is performed under the maximum-likelihood criterion and is accomplished by mapping diffeomorphisms to the tangent space of a Hilbert sphere, a vector space whose elements can be expressed using an orthogonal basis. This framework is first applied to shape-constrained univariate, unconditional pdf estimation and then extended to conditional pdf estimation. We derive asymptotic convergence rates of the estimator and demonstrate this approach using a synthetic dataset involving speed distribution for different traffic flow on Californian driveways.
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
On Statistical Optimality of Variational Bayes
Authors:
Debdeep Pati,
Anirban Bhattacharya,
Yun Yang
Abstract:
The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal a…
▽ More
The article addresses a long-standing open problem on the justification of using variational Bayes methods for parameter estimation. We provide general conditions for obtaining optimal risk bounds for point estimates acquired from mean-field variational Bayesian inference. The conditions pertain to the existence of certain test functions for the distance metric on the parameter space and minimal assumptions on the prior. A general recipe for verification of the conditions is outlined which is broadly applicable to existing Bayesian models with or without latent variables. As illustrations, specific applications to Latent Dirichlet Allocation and Gaussian mixture models are discussed.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.
-
$α$-Variational Inference with Statistical Guarantees
Authors:
Yun Yang,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
We propose a family of variational approximations to Bayesian posterior distributions, called $α$-VB, with provable statistical guarantees. The standard variational approximation is a special case of $α$-VB with $α=1$. When $α\in(0,1]$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational…
▽ More
We propose a family of variational approximations to Bayesian posterior distributions, called $α$-VB, with provable statistical guarantees. The standard variational approximation is a special case of $α$-VB with $α=1$. When $α\in(0,1]$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational optimization problem, implying that maximizing the evidence lower bound in variational inference has the effect of minimizing the Bayes risk within the variational density family. Operating in a frequentist setup, the variational inequalities imply that point estimates constructed from the $α$-VB procedure converge at an optimal rate to the true parameter in a wide range of problems. We illustrate our general theory with a number of examples, including the mean-field variational approximation to (low)-high-dimensional Bayesian linear regression with spike and slab priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture of) Gaussian variational approximation in regular parametric models.
△ Less
Submitted 7 February, 2018; v1 submitted 9 October, 2017;
originally announced October 2017.
-
Frequentist coverage and sup-norm convergence rate in Gaussian process regression
Authors:
Yun Yang,
Anirban Bhattacharya,
Debdeep Pati
Abstract:
Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in GP regression. As an intermediate result, we develop a Bernstein von-Mises type result under supremum norm in random design GP regre…
▽ More
Gaussian process (GP) regression is a powerful interpolation technique due to its flexibility in capturing non-linearity. In this paper, we provide a general framework for understanding the frequentist coverage of point-wise and simultaneous Bayesian credible sets in GP regression. As an intermediate result, we develop a Bernstein von-Mises type result under supremum norm in random design GP regression. Identifying both the mean and covariance function of the posterior distribution of the Gaussian process as regularized $M$-estimators, we show that the sampling distribution of the posterior mean function and the centered posterior distribution can be respectively approximated by two population level GPs. By developing a comparison inequality between two GPs, we provide exact characterization of frequentist coverage probabilities of Bayesian point-wise credible intervals and simultaneous credible bands of the regression function. Our results show that inference based on GP regression tends to be conservative; when the prior is under-smoothed, the resulting credible intervals and bands have minimax-optimal sizes, with their frequentist coverage converging to a non-degenerate value between their nominal level and one. As a byproduct of our theory, we show that the GP regression also yields minimax-optimal posterior contraction rate relative to the supremum norm, which provides a positive evidence to the long standing problem on optimal supremum norm contraction rate in GP regression.
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
Compressed Covariance Estimation With Automated Dimension Learning
Authors:
Gautam Sabnis,
Debdeep Pati,
Anirban Bhattacharya
Abstract:
We propose a method for estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity…
▽ More
We propose a method for estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. Experimental simulation results demonstrate the efficacy and scalability of our proposed approach.
△ Less
Submitted 1 April, 2017;
originally announced April 2017.
-
Adaptive posterior convergence rates in non-linear latent variable models
Authors:
Shuang Zhou,
Debdeep Pati,
Anirban Bhattacharya,
David Dunson
Abstract:
Non-linear latent variable models have become increasingly popular in a variety of applications. However, there has been little study on theoretical properties of these models. In this article, we study rates of posterior contraction in univariate density estimation for a class of non-linear latent variable models where unobserved U(0,1) latent variables are related to the response variables via a…
▽ More
Non-linear latent variable models have become increasingly popular in a variety of applications. However, there has been little study on theoretical properties of these models. In this article, we study rates of posterior contraction in univariate density estimation for a class of non-linear latent variable models where unobserved U(0,1) latent variables are related to the response variables via a random non-linear regression with an additive error. Our approach relies on characterizing the space of densities induced by the above model as kernel convolutions with a general class of continuous mixing measures. The literature on posterior rates of contraction in density estimation almost entirely focuses on finite or countably infinite mixture models. We develop approximation results for our class of continuous mixing measures. Using an appropriate Gaussian process prior on the unknown regression function, we obtain the optimal frequentist rate up to a logarithmic factor under standard regularity conditions on the true density.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.
-
A Two-Step Geometric Framework For Density Modeling
Authors:
Sutanoy Dasgupta,
Debdeep Pati,
Anuj Srivastava
Abstract:
We introduce a novel two-step approach for estimating a probability density function (pdf) given its samples, with the second and important step coming from a geometric formulation. The procedure involves obtaining an initial estimate of the pdf and then transforming it via a warping function to reach the final estimate. The initial estimate is intended to be computationally fast, albeit suboptima…
▽ More
We introduce a novel two-step approach for estimating a probability density function (pdf) given its samples, with the second and important step coming from a geometric formulation. The procedure involves obtaining an initial estimate of the pdf and then transforming it via a warping function to reach the final estimate. The initial estimate is intended to be computationally fast, albeit suboptimal, but its warping creates a larger, flexible class of density functions, resulting in substantially improved estimation. The search for optimal warping is accomplished by mapping diffeomorphic functions to the tangent space of a Hilbert sphere, a vector space whose elements can be expressed using an orthogonal basis. Using a truncated basis expansion, we estimate the optimal warping under a (penalized) likelihood criterion and, thus, the optimal density estimate. This framework is introduced for univariate, unconditional pdf estimation and then extended to conditional pdf estimation. The approach avoids many of the computational pitfalls associated with classical conditional-density estimation methods, without losing on estimation performance. We derive asymptotic convergence rates of the density estimator and demonstrate this approach using both synthetic datasets and real data, the latter relating to the association of a toxic metabolite on preterm birth.
△ Less
Submitted 12 December, 2017; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Bayesian model selection consistency and oracle inequality with intractable marginal likelihood
Authors:
Yun Yang,
Debdeep Pati
Abstract:
In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Ba…
▽ More
In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.
△ Less
Submitted 9 January, 2017; v1 submitted 1 January, 2017;
originally announced January 2017.
-
Monte Carlo goodness-of-fit tests for degree corrected and related stochastic blockmodels
Authors:
Vishesh Karwa,
Debdeep Pati,
Sonja Petrović,
Liam Solus,
Nikita Alexeev,
Mateja Raič,
Dane Wilburne,
Robert Williams,
Bowei Yan
Abstract:
We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goo…
▽ More
We construct Bayesian and frequentist finite-sample goodness-of-fit tests for three different variants of the stochastic blockmodel for network data. Since all of the stochastic blockmodel variants are log-linear in form when block assignments are known, the tests for the \emph{latent} block model versions combine a block membership estimator with the algebraic statistics machinery for testing goodness-of-fit in log-linear models. We describe Markov bases and marginal polytopes of the variants of the stochastic blockmodel, and discuss how both facilitate the development of goodness-of-fit tests and understanding of model behavior.
The general testing methodology developed here extends to any finite mixture of log-linear models on discrete data, and as such is the first application of the algebraic statistics machinery for latent-variable models.
△ Less
Submitted 6 March, 2024; v1 submitted 18 December, 2016;
originally announced December 2016.
-
A Divide and Conquer Strategy for High Dimensional Bayesian Factor Models
Authors:
Gautam Sabnis,
Debdeep Pati,
Barbara Engelhardt,
Natesh Pillai
Abstract:
We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a…
▽ More
We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations $n$ into subsamples while keeping the dimension $p$ fixed. Our approach is novel in this regard: it includes all of the $n$ samples in each subproblem and, instead, splits the dimension $p$ into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when $p$ is large due to the dependencies across dimensions. To circumvent this issue, we specify a novel hierarchical structure on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. We report the performance of our method in synthetic examples and a genomics application.
△ Less
Submitted 28 December, 2016; v1 submitted 8 December, 2016;
originally announced December 2016.
-
Bayesian fractional posteriors
Authors:
Anirban Bhattacharya,
Debdeep Pati,
Yun Yang
Abstract:
We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullba…
▽ More
We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback-Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in misspecified models. Our derivation reveals several advantages of averaging based Bayesian procedures over optimization based frequentist procedures. As an application of the Bayesian oracle inequality, we derive a sharp oracle inequality in the convex regression problem under an arbitrary dimension. We also illustrate the theory in Gaussian process regression and density estimation problems.
△ Less
Submitted 7 November, 2016; v1 submitted 3 November, 2016;
originally announced November 2016.
-
Sparse additive Gaussian process with soft interactions
Authors:
Garret Vo,
Debdeep Pati
Abstract:
Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the…
▽ More
Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the number of nonparametric components as well as in the variables within each nonparametric component poses a stiff computational challenge. In this article, we develop a novel Bayesian additive regression model using a combination of hard and soft shrinkages to separately control the number of additive components and the variables within each component. An efficient algorithm is developed to select the importance variables and estimate the interaction network. Excellent performance is obtained in simulated and real data examples.
△ Less
Submitted 9 July, 2016;
originally announced July 2016.
-
Sub-optimality of some continuous shrinkage priors
Authors:
Anirban Bhattacharya,
David B. Dunson,
Debdeep Pati,
Natesh S. Pillai
Abstract:
Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse posterior summaries under standard loss functions, has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global…
▽ More
Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse posterior summaries under standard loss functions, has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians. Interestingly, we demonstrate that many commonly used shrinkage priors, including the Bayesian Lasso, do not have adequate posterior concentration in high-dimensional settings.
△ Less
Submitted 18 May, 2016;
originally announced May 2016.
-
Bayesian Variable Selection for Skewed Heteroscedastic Response
Authors:
Libo Wang,
Yuanyuan Tang,
Debajyoti Sinha,
Debdeep Pati,
Stuart Lipsitz
Abstract:
In this article, we propose new Bayesian methods for selecting and estimating a sparse coefficient vector for skewed heteroscedastic response. Our novel Bayesian procedures effectively estimate the median and other quantile functions, accommodate non-local prior for regression effects without compromising ease of implementation via sampling based tools, and asymptotically select the true set of pr…
▽ More
In this article, we propose new Bayesian methods for selecting and estimating a sparse coefficient vector for skewed heteroscedastic response. Our novel Bayesian procedures effectively estimate the median and other quantile functions, accommodate non-local prior for regression effects without compromising ease of implementation via sampling based tools, and asymptotically select the true set of predictors even when the number of covariates increases in the same order of the sample size. We also extend our method to deal with some observations with very large errors. Via simulation studies and a re-analysis of a medical cost study with large number of potential predictors, we illustrate the ease of implementation and other practical advantages of our approach compared to existing methods for such studies.
△ Less
Submitted 3 July, 2017; v1 submitted 29 February, 2016;
originally announced February 2016.