-
Identifying Metric Structures of Deep Latent Variable Models
Authors:
Stas Syrota,
Yevgen Zainchkovskyy,
Johnny Xi,
Benjamin Bloem-Reddy,
Søren Hauberg
Abstract:
Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be uniquely determined. Domain experts, therefore, need to tread carefully when interpreting these. Current solutions limit the lack of identifiability through ad…
▽ More
Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be uniquely determined. Domain experts, therefore, need to tread carefully when interpreting these. Current solutions limit the lack of identifiability through additional constraints on the latent variable model, e.g. by requiring labeled training data, or by restricting the expressivity of the model. We change the goal: instead of identifying the latent variables, we identify relationships between them such as meaningful distances, angles, and volumes. We prove this is feasible under very mild model conditions and without additional labeled data. We empirically demonstrate that our theory results in more reliable latent distances, offering a principled path forward in extracting trustworthy conclusions from deep latent variable models.
△ Less
Submitted 30 May, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Distinguishing Cause from Effect with Causal Velocity Models
Authors:
Johnny Xi,
Hugh Dance,
Peter Orbanz,
Benjamin Bloem-Reddy
Abstract:
Bivariate structural causal models (SCM) are often used to infer causal direction by examining their goodness-of-fit under restricted model classes. In this paper, we describe a parametrization of bivariate SCMs in terms of a causal velocity by viewing the cause variable as time in a dynamical system. The velocity implicitly defines counterfactual curves via the solution of initial value problems…
▽ More
Bivariate structural causal models (SCM) are often used to infer causal direction by examining their goodness-of-fit under restricted model classes. In this paper, we describe a parametrization of bivariate SCMs in terms of a causal velocity by viewing the cause variable as time in a dynamical system. The velocity implicitly defines counterfactual curves via the solution of initial value problems where the observation specifies the initial condition. Using tools from measure transport, we obtain a unique correspondence between SCMs and the score function of the generated distribution via its causal velocity. Based on this, we derive an objective function that directly regresses the velocity against the score function, the latter of which can be estimated non-parametrically from observational data. We use this to develop a method for bivariate causal discovery that extends beyond known model classes such as additive or location scale noise, and that requires no assumptions on the noise distributions. When the score is estimated well, the objective is also useful for detecting model non-identifiability and misspecification. We present positive results in simulation and benchmark experiments where many existing methods fail, and perform ablation studies to examine the method's sensitivity to accurate score estimation.
△ Less
Submitted 9 June, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
Mixed Variational Flows for Discrete Variables
Authors:
Gian Carlo Diluvi,
Benjamin Bloem-Reddy,
Trevor Campbell
Abstract:
Variational flows allow practitioners to learn complex continuous distributions, but approximating discrete distributions remains a challenge. Current methodologies typically embed the discrete target in a continuous space - usually via continuous relaxation or dequantization - and then apply a continuous flow. These approaches involve a surrogate target that may not capture the original discrete…
▽ More
Variational flows allow practitioners to learn complex continuous distributions, but approximating discrete distributions remains a challenge. Current methodologies typically embed the discrete target in a continuous space - usually via continuous relaxation or dequantization - and then apply a continuous flow. These approaches involve a surrogate target that may not capture the original discrete target, might have biased or unstable gradients, and can create a difficult optimization problem. In this work, we develop a variational flow family for discrete distributions without any continuous embedding. First, we develop a measure-preserving and discrete (MAD) invertible map that leaves the discrete target invariant, and then create a mixed variational flow (MAD Mix) based on that map. Our family provides access to i.i.d. sampling and density evaluation with virtually no tuning effort. We also develop an extension to MAD Mix that handles joint discrete and continuous models. Our experiments suggest that MAD Mix produces more reliable approximations than continuous-embedding flows while being significantly faster to train.
△ Less
Submitted 26 February, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Indeterminacy in Generative Models: Characterization and Strong Identifiability
Authors:
Quanhan Xi,
Benjamin Bloem-Reddy
Abstract:
Most modern probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Different tasks tolerate different indeterminacies, however recent applications have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made t…
▽ More
Most modern probabilistic generative models, such as the variational autoencoder (VAE), have certain indeterminacies that are unresolvable even with an infinite amount of data. Different tasks tolerate different indeterminacies, however recent applications have indicated the need for strongly identifiable models, in which an observation corresponds to a unique latent code. Progress has been made towards reducing model indeterminacies while maintaining flexibility, and recent work excludes many--but not all--indeterminacies. In this work, we motivate model-identifiability in terms of task-identifiability, then construct a theoretical framework for analyzing the indeterminacies of latent variable models, which enables their precise characterization in terms of the generator function and prior distribution spaces. We reveal that strong identifiability is possible even with highly flexible nonlinear generators, and give two such examples. One is a straightforward modification of iVAE (arXiv:1907.04809 [stat.ML]); the other uses triangular monotonic maps, leading to novel connections between optimal transport and identifiability.
△ Less
Submitted 2 March, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Lossy Compression for Lossless Prediction
Authors:
Yann Dubois,
Benjamin Bloem-Reddy,
Karen Ullrich,
Chris J. Maddison
Abstract:
Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on o…
▽ More
Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000\times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.
△ Less
Submitted 28 January, 2022; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Uncertainty in Neural Processes
Authors:
Saeid Naderiparizi,
Kenny Chiu,
Benjamin Bloem-Reddy,
Frank Wood
Abstract:
We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data…
▽ More
We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
On the Benefits of Invariance in Neural Networks
Authors:
Clare Lyle,
Mark van der Wilk,
Marta Kwiatkowska,
Yarin Gal,
Benjamin Bloem-Reddy
Abstract:
Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of methods to incorporate invariance into models, theoretical understanding is poor and there is no way to assess when one method should be preferred over another. In…
▽ More
Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of methods to incorporate invariance into models, theoretical understanding is poor and there is no way to assess when one method should be preferred over another. In this work, we analyze the benefits and limitations of two widely used approaches in deep learning in the presence of invariance: data augmentation and feature averaging. We prove that training with data augmentation leads to better estimates of risk and gradients thereof, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds. We provide empirical support of these theoretical results, including a demonstration of why generalization may not improve by training with data augmentation: the `learned invariance' fails outside of the training distribution.
△ Less
Submitted 30 April, 2020;
originally announced May 2020.
-
Probabilistic symmetries and invariant neural networks
Authors:
Benjamin Bloem-Reddy,
Yee Whye Teh
Abstract:
Treating neural network inputs and outputs as random variables, we characterize the structure of neural networks that can be used to model data that are invariant or equivariant under the action of a compact group. Much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures, in an effort to improve the performance of deep neural net…
▽ More
Treating neural network inputs and outputs as random variables, we characterize the structure of neural networks that can be used to model data that are invariant or equivariant under the action of a compact group. Much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures, in an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings. By considering group invariance from the perspective of probabilistic symmetry, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of probability distributions that are invariant or equivariant under the action of a compact group. Our representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We demonstrate that examples from the recent literature are special cases, and develop the details of the general program for exchangeable sequences and arrays.
△ Less
Submitted 16 September, 2020; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Sequential sampling of Gaussian process latent variable models
Authors:
Martin Tegner,
Benjamin Bloem-Reddy,
Stephen Roberts
Abstract:
We consider the problem of inferring a latent function in a probabilistic model of data. When dependencies of the latent function are specified by a Gaussian process and the data likelihood is complex, efficient computation often involve Markov chain Monte Carlo sampling with limited applicability to large data sets. We extend some of these techniques to scale efficiently when the problem exhibits…
▽ More
We consider the problem of inferring a latent function in a probabilistic model of data. When dependencies of the latent function are specified by a Gaussian process and the data likelihood is complex, efficient computation often involve Markov chain Monte Carlo sampling with limited applicability to large data sets. We extend some of these techniques to scale efficiently when the problem exhibits a sequential structure. We propose an approximation that enables sequential sampling of both latent variables and associated parameters. We demonstrate strong performance in growing-data settings that would otherwise be unfeasible with naive, non-sequential sampling.
△ Less
Submitted 20 July, 2018; v1 submitted 13 July, 2018;
originally announced July 2018.
-
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Authors:
Benjamin Bloem-Reddy,
Adam Foster,
Emile Mathieu,
Yee Whye Teh
Abstract:
Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $η$ that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with $η< 2$, and admit tractable inference algorithms; we draw on previous results to show that $η> 2$ can…
▽ More
Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $η$ that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with $η< 2$, and admit tractable inference algorithms; we draw on previous results to show that $η> 2$ cannot be generated by the forms of exchangeability used in existing random graph models. Preferential attachment models generate power law exponents greater than two, but have been of limited use as statistical models due to the inherent difficulty of performing inference in non-exchangeable models. Motivated by this gap, we design and implement inference algorithms for a recently proposed class of models that generates $η$ of all possible values. We show that although they are not exchangeable, these models have probabilistic structure amenable to inference. Our methods make a large class of previously intractable models useful for statistical inference.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Preferential Attachment and Vertex Arrival Times
Authors:
Benjamin Bloem-Reddy,
Peter Orbanz
Abstract:
We study preferential attachment mechanisms in random graphs that are parameterized by (i) a constant bias affecting the degree-biased distribution on the vertex set and (ii) the distribution of times at which new vertices are created by the model. The class of random graphs so defined admits a representation theorem reminiscent of residual allocation, or "stick-breaking" schemes. We characterize…
▽ More
We study preferential attachment mechanisms in random graphs that are parameterized by (i) a constant bias affecting the degree-biased distribution on the vertex set and (ii) the distribution of times at which new vertices are created by the model. The class of random graphs so defined admits a representation theorem reminiscent of residual allocation, or "stick-breaking" schemes. We characterize how the vertex arrival times affect the asymptotic degree distribution, and relate the latter to neutral-to-the-left processes. Our random graphs generate edges "one end at a time", which sets up a one-to-one correspondence between random graphs and random partitions of natural numbers; via this map, our representation induces a result on (not necessarily exchangeable) random partitions that generalizes a theorem of Griffiths and Spanó. A number of examples clarify how the class intersects with several known random graph models.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.