-
Bayesian Circular Regression with von Mises Quasi-Processes
Authors:
Yarden Cohen,
Alexandre Khae Wu Navarro,
Jes Frellsen,
Richard E. Turner,
Raziel Riemer,
Ari Pakman
Abstract:
The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The probability model has connections with continuous spin models in statistical physics. Moreov…
▽ More
The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes targeting two Euclidean dimensions conditioned on the unit circle. The probability model has connections with continuous spin models in statistical physics. Moreover, its density is very simple and has maximum-entropy, unlike previous Gaussian process-based approaches, which use wrapping or radial marginalization. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling. We argue that transductive learning in these models favors a Bayesian approach to the parameters and apply our sampling scheme to the Double Metropolis-Hastings algorithm. We present experiments applying this model to the prediction of (i) wind directions and (ii) the percentage of the running gait cycle as a function of joint angles.
△ Less
Submitted 18 March, 2025; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Super-Efficient Exact Hamiltonian Monte Carlo for the von Mises Distribution
Authors:
Ari Pakman
Abstract:
Markov Chain Monte Carlo algorithms, the method of choice to sample from generic high-dimensional distributions, are rarely used for continuous one-dimensional distributions, for which more effective approaches are usually available (e.g. rejection sampling). In this work we present a counter-example to this conventional wisdom for the von Mises distribution, a maximum-entropy distribution over th…
▽ More
Markov Chain Monte Carlo algorithms, the method of choice to sample from generic high-dimensional distributions, are rarely used for continuous one-dimensional distributions, for which more effective approaches are usually available (e.g. rejection sampling). In this work we present a counter-example to this conventional wisdom for the von Mises distribution, a maximum-entropy distribution over the circle. We show that Hamiltonian Monte Carlo with Laplacian momentum has exactly solvable equations of motion and, with an appropriate travel time, the Markov chain has negative autocorrelation at odd lags for odd observables and yields a relative effective sample size bigger than one.
△ Less
Submitted 7 December, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Marginalizable Density Models
Authors:
Dar Gilboa,
Ari Pakman,
Thibault Vatter
Abstract:
Probability density models based on deep networks have achieved remarkable success in modeling complex high-dimensional datasets. However, unlike kernel density estimators, modern neural models do not yield marginals or conditionals in closed form, as these quantities require the evaluation of seldom tractable integrals. In this work, we present the Marginalizable Density Model Approximator (MDMA)…
▽ More
Probability density models based on deep networks have achieved remarkable success in modeling complex high-dimensional datasets. However, unlike kernel density estimators, modern neural models do not yield marginals or conditionals in closed form, as these quantities require the evaluation of seldom tractable integrals. In this work, we present the Marginalizable Density Model Approximator (MDMA), a novel deep network architecture which provides closed form expressions for the probabilities, marginals and conditionals of any subset of the variables. The MDMA learns deep scalar representations for each individual variable and combines them via learned hierarchical tensor decompositions into a tractable yet expressive CDF, from which marginals and conditional densities are easily obtained. We illustrate the advantage of exact marginalizability in several tasks that are out of reach of previous deep network-based density estimation models, such as estimating mutual information between arbitrary subsets of variables, inferring causality by testing for conditional independence, and inference with missing data without the need for data imputation, outperforming state-of-the-art models on these tasks. The model also allows for parallelized sampling with only a logarithmic dependence of the time complexity on the number of variables.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Estimating the Unique Information of Continuous Variables
Authors:
Ari Pakman,
Amin Nejatbakhsh,
Dar Gilboa,
Abdullah Makkeh,
Luca Mazzucato,
Michael Wibral,
Elad Schneidman
Abstract:
The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have stu…
▽ More
The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have studied aspects of PID for Gaussian and discrete distributions, the case of general continuous distributions is still uncharted territory. In this work we present a method for estimating the unique information in continuous distributions, for the case of one versus two variables. Our method solves the associated optimization problem over the space of distributions with fixed bivariate marginals by combining copula decompositions and techniques developed to optimize variational autoencoders. We obtain excellent agreement with known analytic results for Gaussians, and illustrate the power of our new approach in several brain-inspired neural models. Our method is capable of recovering the effective connectivity of a chaotic network of rate neurons, and uncovers a complex trade-off between redundancy, synergy and unique information in recurrent networks trained to solve a generalized XOR task.
△ Less
Submitted 26 October, 2021; v1 submitted 30 January, 2021;
originally announced February 2021.
-
Amortized Probabilistic Detection of Communities in Graphs
Authors:
Yueqi Wang,
Yoonho Lee,
Pallab Basu,
Juho Lee,
Yee Whye Teh,
Liam Paninski,
Ari Pakman
Abstract:
Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a sim…
▽ More
Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a simple framework for amortized community detection, which addresses both of these issues by combining the expressive power of GNNs with recent methods for amortized clustering. Our models consist of a graph representation backbone that extracts structural information and an amortized clustering network that naturally handles variable numbers of clusters. Both components combine into well-defined models of the posterior distribution of graph communities and are jointly optimized given labeled graphs. At inference time, the models yield parallel samples from the posterior of community labels, quantifying uncertainty in a principled way. We evaluate several models from our framework on synthetic and real datasets, and demonstrate improved performance compared to previous methods. As a separate contribution, we extend recent amortized probabilistic clustering architectures by adding attention modules, which yield further improvements on community detection tasks.
△ Less
Submitted 2 August, 2024; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Neural Clustering Processes
Authors:
Ari Pakman,
Yueqi Wang,
Catalin Mitelut,
JinHyung Lee,
Liam Paninski
Abstract:
Probabilistic clustering models (or equivalently, mixture models) are basic building blocks in countless statistical models and involve latent random variables over discrete spaces. For these models, posterior inference methods can be inaccurate and/or very slow. In this work we introduce deep network architectures trained with labeled samples from any generative model of clustered datasets. At te…
▽ More
Probabilistic clustering models (or equivalently, mixture models) are basic building blocks in countless statistical models and involve latent random variables over discrete spaces. For these models, posterior inference methods can be inaccurate and/or very slow. In this work we introduce deep network architectures trained with labeled samples from any generative model of clustered datasets. At test time, the networks generate approximate posterior samples of cluster labels for any new dataset of arbitrary size. We develop two complementary approaches to this task, requiring either O(N) or O(K) network forward passes per dataset, where N is the dataset size and K the number of clusters. Unlike previous approaches, our methods sample the labels of all the data points from a well-defined posterior, and can learn nonparametric Bayesian posteriors since they do not limit the number of mixture components. As a scientific application, we present a novel approach to neural spike sorting for high-density multielectrode arrays.
△ Less
Submitted 23 June, 2020; v1 submitted 28 December, 2018;
originally announced January 2019.
-
Amortized Bayesian inference for clustering models
Authors:
Ari Pakman,
Liam Paninski
Abstract:
We develop methods for efficient amortized approximate Bayesian inference over posterior distributions of probabilistic clustering models, such as Dirichlet process mixture models. The approach is based on mapping distributed, symmetry-invariant representations of cluster arrangements into conditional probabilities. The method parallelizes easily, yields iid samples from the approximate posterior…
▽ More
We develop methods for efficient amortized approximate Bayesian inference over posterior distributions of probabilistic clustering models, such as Dirichlet process mixture models. The approach is based on mapping distributed, symmetry-invariant representations of cluster arrangements into conditional probabilities. The method parallelizes easily, yields iid samples from the approximate posterior of cluster assignments with the same computational cost of a single Gibbs sampler sweep, and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Binary Bouncy Particle Sampler
Authors:
Ari Pakman
Abstract:
The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it t…
▽ More
The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it to binary Hamiltonian Monte Carlo. Our results suggest that binary BPS samplers are better for easy to mix distributions.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.
-
Stochastic Bouncy Particle Sampler
Authors:
Ari Pakman,
Dar Gilboa,
David Carlson,
Liam Paninski
Abstract:
We introduce a novel stochastic version of the non-reversible, rejection-free Bouncy Particle Sampler (BPS), a Markov process whose sample trajectories are piecewise linear. The algorithm is based on simulating first arrival times in a doubly stochastic Poisson process using the thinning method, and allows efficient sampling of Bayesian posteriors in big datasets. We prove that in the BPS no bias…
▽ More
We introduce a novel stochastic version of the non-reversible, rejection-free Bouncy Particle Sampler (BPS), a Markov process whose sample trajectories are piecewise linear. The algorithm is based on simulating first arrival times in a doubly stochastic Poisson process using the thinning method, and allows efficient sampling of Bayesian posteriors in big datasets. We prove that in the BPS no bias is introduced by noisy evaluations of the log-likelihood gradient. On the other hand, we argue that efficiency considerations favor a small, controllable bias in the construction of the thinning proposals, in exchange for faster mixing. We introduce a simple regression-based proposal intensity for the thinning method that controls this trade-off. We illustrate the algorithm in several examples in which it outperforms both unbiased, but slowly mixing stochastic versions of BPS, as well as biased stochastic gradient-based samplers.
△ Less
Submitted 13 June, 2017; v1 submitted 2 September, 2016;
originally announced September 2016.
-
Partition Functions from Rao-Blackwellized Tempered Sampling
Authors:
David Carlson,
Patrick Stinson,
Ari Pakman,
Liam Paninski
Abstract:
Partition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability…
▽ More
Partition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM); moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost.
△ Less
Submitted 25 May, 2016; v1 submitted 6 March, 2016;
originally announced March 2016.
-
Bayesian spike inference from calcium imaging data
Authors:
Eftychios A. Pnevmatikakis,
Josh Merel,
Ari Pakman,
Liam Paninski
Abstract:
We present efficient Bayesian methods for extracting neuronal spiking information from calcium imaging data. The goal of our methods is to sample from the posterior distribution of spike trains and model parameters (baseline concentration, spike amplitude etc) given noisy calcium imaging data. We present discrete time algorithms where we sample the existence of a spike at each time bin using Gibbs…
▽ More
We present efficient Bayesian methods for extracting neuronal spiking information from calcium imaging data. The goal of our methods is to sample from the posterior distribution of spike trains and model parameters (baseline concentration, spike amplitude etc) given noisy calcium imaging data. We present discrete time algorithms where we sample the existence of a spike at each time bin using Gibbs methods, as well as continuous time algorithms where we sample over the number of spikes and their locations at an arbitrary resolution using Metropolis-Hastings methods for point processes. We provide Rao-Blackwellized extensions that (i) marginalize over several model parameters and (ii) provide smooth estimates of the marginal spike posterior distribution in continuous time. Our methods serve as complements to standard point estimates and allow for quantification of uncertainty in estimating the underlying spike train and model parameters.
△ Less
Submitted 26 November, 2013;
originally announced November 2013.
-
Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
Authors:
Ari Pakman,
Liam Paninski
Abstract:
We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. An extension of this idea to distributions over mixtures of binary and possibly-truncated Gaussian or exponential variables allows us to sample from posteriors of linear and probit regressio…
▽ More
We present a new approach to sample from generic binary distributions, based on an exact Hamiltonian Monte Carlo algorithm applied to a piecewise continuous augmentation of the binary distribution of interest. An extension of this idea to distributions over mixtures of binary and possibly-truncated Gaussian or exponential variables allows us to sample from posteriors of linear and probit regression models with spike-and-slab priors and truncated parameters. We illustrate the advantages of these algorithms in several examples in which they outperform the Metropolis or Gibbs samplers.
△ Less
Submitted 12 October, 2015; v1 submitted 9 November, 2013;
originally announced November 2013.
-
Exact Hamiltonian Monte Carlo for Truncated Multivariate Gaussians
Authors:
Ari Pakman,
Liam Paninski
Abstract:
We present a Hamiltonian Monte Carlo algorithm to sample from multivariate Gaussian distributions in which the target space is constrained by linear and quadratic inequalities or products thereof. The Hamiltonian equations of motion can be integrated exactly and there are no parameters to tune. The algorithm mixes faster and is more efficient than Gibbs sampling. The runtime depends on the number…
▽ More
We present a Hamiltonian Monte Carlo algorithm to sample from multivariate Gaussian distributions in which the target space is constrained by linear and quadratic inequalities or products thereof. The Hamiltonian equations of motion can be integrated exactly and there are no parameters to tune. The algorithm mixes faster and is more efficient than Gibbs sampling. The runtime depends on the number and shape of the constraints but the algorithm is highly parallelizable. In many cases, we can exploit special structure in the covariance matrices of the untruncated Gaussian to further speed up the runtime. A simple extension of the algorithm permits sampling from distributions whose log-density is piecewise quadratic, as in the "Bayesian Lasso" model.
△ Less
Submitted 5 June, 2013; v1 submitted 20 August, 2012;
originally announced August 2012.