-
Control-ITRA: Controlling the Behavior of a Driving Model
Authors:
Vasileios Lioutas,
Adam Scibior,
Matthew Niedoba,
Berend Zwartsenberg,
Frank Wood
Abstract:
Simulating realistic driving behavior is crucial for developing and testing autonomous systems in complex traffic environments. Equally important is the ability to control the behavior of simulated agents to tailor scenarios to specific research needs and safety considerations. This paper extends the general-purpose multi-agent driving behavior model ITRA (Scibior et al., 2021), by introducing a m…
▽ More
Simulating realistic driving behavior is crucial for developing and testing autonomous systems in complex traffic environments. Equally important is the ability to control the behavior of simulated agents to tailor scenarios to specific research needs and safety considerations. This paper extends the general-purpose multi-agent driving behavior model ITRA (Scibior et al., 2021), by introducing a method called Control-ITRA to influence agent behavior through waypoint assignment and target speed modulation. By conditioning agents on these two aspects, we provide a mechanism for them to adhere to specific trajectories and indirectly adjust their aggressiveness. We compare different approaches for integrating these conditions during training and demonstrate that our method can generate controllable, infraction-free trajectories while preserving realism in both seen and unseen locations.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
All-in-one simulation-based inference
Authors:
Manuel Gloeckler,
Michael Deistler,
Christian Weilbach,
Frank Wood,
Jakob H. Macke
Abstract:
Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks…
▽ More
Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks ahead of time. Here, we present a new amortized inference method -- the Simformer -- which overcomes these limitations. By training a probabilistic diffusion model with transformer architectures, the Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks and is substantially more flexible: It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data, including both posterior and likelihood. We showcase the performance and flexibility of the Simformer on simulators from ecology, epidemiology, and neuroscience, and demonstrate that it opens up new possibilities and application domains for amortized Bayesian inference on simulation-based models.
△ Less
Submitted 15 July, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Nearest Neighbour Score Estimators for Diffusion Generative Models
Authors:
Matthew Niedoba,
Dylan Green,
Saeid Naderiparizi,
Vasileios Lioutas,
Jonathan Wilder Lavington,
Xiaoxuan Liang,
Yunpeng Liu,
Ke Zhang,
Setareh Dabiri,
Adam Ścibior,
Berend Zwartsenberg,
Frank Wood
Abstract:
Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set…
▽ More
Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.
△ Less
Submitted 16 July, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Don't be so negative! Score-based Generative Modeling with Oracle-assisted Guidance
Authors:
Saeid Naderiparizi,
Xiaoxuan Liang,
Berend Zwartsenberg,
Frank Wood
Abstract:
The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can la…
▽ More
The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG, that leverages this additional side-information. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Uncertain Evidence in Probabilistic Models and Stochastic Simulators
Authors:
Andreas Munk,
Alexander Mead,
Frank Wood
Abstract:
We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence." We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence" as well a…
▽ More
We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence." We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence" as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as "correct." We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence.
△ Less
Submitted 26 January, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Conditional Permutation Invariant Flows
Authors:
Berend Zwartsenberg,
Adam Ścibior,
Matthew Niedoba,
Vasileios Lioutas,
Yunpeng Liu,
Justice Sefas,
Setareh Dabiri,
Jonathan Wilder Lavington,
Trevor Campbell,
Frank Wood
Abstract:
We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including…
▽ More
We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Critic Sequential Monte Carlo
Authors:
Vasileios Lioutas,
Jonathan Wilder Lavington,
Justice Sefas,
Matthew Niedoba,
Yunpeng Liu,
Berend Zwartsenberg,
Setareh Dabiri,
Frank Wood,
Adam Scibior
Abstract:
We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard…
▽ More
We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.
△ Less
Submitted 21 January, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Gradients without Backpropagation
Authors:
Atılım Güneş Baydin,
Barak A. Pearlmutter,
Don Syme,
Frank Wood,
Philip Torr
Abstract:
Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can comp…
▽ More
Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
q-Paths: Generalizing the Geometric Annealing Path using Power Means
Authors:
Vaden Masrani,
Rob Brekelmans,
Thang Bui,
Frank Nielsen,
Aram Galstyan,
Greg Ver Steeg,
Frank Wood
Abstract:
Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of…
▽ More
Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $α$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Differentiable Particle Filtering without Modifying the Forward Pass
Authors:
Adam Ścibior,
Frank Wood
Abstract:
Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a si…
▽ More
Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a simple correction to the particle weights. This correction utilizes the stop-gradient operator and does not modify the particle filter operation on the forward pass, while also being cheap and easy to compute. Surprisingly, with the same correction automatic differentiation also produces good estimators for gradients of expectations under the posterior. We can therefore regard our method as a general recipe for making particle filters differentiable. We additionally show that it produces desired estimators for second-order derivatives and how to extend it to further reduce variance at the expense of additional computation.
△ Less
Submitted 19 October, 2021; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Imagining The Road Ahead: Multi-Agent Trajectory Prediction via Differentiable Simulation
Authors:
Adam Scibior,
Vasileios Lioutas,
Daniele Reda,
Peyman Bateni,
Frank Wood
Abstract:
We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agen…
▽ More
We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agent state using a kinematic bicycle model. The full simulation state is then differentiably rendered for each agent, initiating the next time step. We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective, producing realistic multi-modal predictions without any ad-hoc diversity-inducing losses. We conduct ablation studies to examine individual components of the simulator, finding that both the kinematic bicycle model and the continuous feedback from the birdview image are crucial for achieving this level of performance. We name our model ITRA, for "Imagining the Road Ahead".
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Robust Asymmetric Learning in POMDPs
Authors:
Andrew Warrington,
J. Wilder Lavington,
Adam Ścibior,
Mark Schmidt,
Frank Wood
Abstract:
Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial infor…
▽ More
Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert.
△ Less
Submitted 1 July, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Uncertainty in Neural Processes
Authors:
Saeid Naderiparizi,
Kenny Chiu,
Benjamin Bloem-Reddy,
Frank Wood
Abstract:
We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data…
▽ More
We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Assisting the Adversary to Improve GAN Training
Authors:
Andreas Munk,
William Harvey,
Frank Wood
Abstract:
Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretic…
▽ More
Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a theoretically motivated penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.
△ Less
Submitted 8 December, 2020; v1 submitted 3 October, 2020;
originally announced October 2020.
-
All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference
Authors:
Rob Brekelmans,
Vaden Masrani,
Frank Wood,
Greg Ver Steeg,
Aram Galstyan
Abstract:
The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model lear…
▽ More
The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Semi-supervised Sequential Generative Models
Authors:
Michael Teng,
Tuan Anh Le,
Adam Scibior,
Frank Wood
Abstract:
We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. This instance of semi-supervised learning is challenging for existing methods, because the exponential number of possible discrete latent configurations results in high variance gradient estimators. We first overcome this problem by extendi…
▽ More
We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. This instance of semi-supervised learning is challenging for existing methods, because the exponential number of possible discrete latent configurations results in high variance gradient estimators. We first overcome this problem by extending the standard semi-supervised generative modeling objective with reweighted wake-sleep. However, we find that this approach still suffers when the frequency of available labels varies between training sequences. Finally, we introduce a unified objective inspired by teacher-forcing and show that this approach is robust to variable length supervision. We call the resulting method caffeinated wake-sleep (CWS) to emphasize its additional dependence on real data. We demonstrate its effectiveness with experiments on MNIST, handwriting, and fruit fly trajectory data.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Enhancing Few-Shot Image Classification with Unlabelled Examples
Authors:
Peyman Bateni,
Jarred Barber,
Jan-Willem van de Meent,
Frank Wood
Abstract:
We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on t…
▽ More
We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve state of the art performance on the Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. All trained models and code have been made publicly available at github.com/plai-group/simple-cnaps.
△ Less
Submitted 21 October, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Planning as Inference in Epidemiological Models
Authors:
Frank Wood,
Andrew Warrington,
Saeid Naderiparizi,
Christian Weilbach,
Vaden Masrani,
William Harvey,
Adam Scibior,
Boyan Beronov,
John Grefenstette,
Duncan Campbell,
Ali Nasseri
Abstract:
In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among oth…
▽ More
In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policymaking could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.
△ Less
Submitted 15 September, 2021; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Coping With Simulators That Don't Always Return
Authors:
Andrew Warrington,
Saeid Naderiparizi,
Frank Wood
Abstract:
Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaini…
▽ More
Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.
△ Less
Submitted 28 March, 2020;
originally announced March 2020.
-
Attention for Inference Compilation
Authors:
William Harvey,
Andreas Munk,
Atılım Güneş Baydin,
Alexander Bergholm,
Frank Wood
Abstract:
We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they…
▽ More
We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
Probabilistic Surrogate Networks for Simulators with Unbounded Randomness
Authors:
Andreas Munk,
Berend Zwartsenberg,
Adam Ścibior,
Atılım Güneş Baydin,
Andrew Stewart,
Goran Fernlund,
Anoush Poursartip,
Frank Wood
Abstract:
We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentia…
▽ More
We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates' ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing.
△ Less
Submitted 20 January, 2023; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Amortized Rejection Sampling in Universal Probabilistic Programming
Authors:
Saeid Naderiparizi,
Adam Ścibior,
Andreas Munk,
Mehrdad Ghadiri,
Atılım Güneş Baydin,
Bradley Gram-Hansen,
Christian Schroeder de Witt,
Robert Zinkov,
Philip H. S. Torr,
Tom Rainforth,
Yee Whye Teh,
Frank Wood
Abstract:
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove fini…
▽ More
Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework.
△ Less
Submitted 28 March, 2022; v1 submitted 20 October, 2019;
originally announced October 2019.
-
The Virtual Patch Clamp: Imputing C. elegans Membrane Potentials from Calcium Imaging
Authors:
Andrew Warrington,
Arthur Spencer,
Frank Wood
Abstract:
We develop a stochastic whole-brain and body simulator of the nematode roundworm Caenorhabditis elegans (C. elegans) and show that it is sufficiently regularizing to allow imputation of latent membrane potentials from partial calcium fluorescence imaging observations. This is the first attempt we know of to "complete the circle," where an anatomically grounded whole-connectome simulator is used to…
▽ More
We develop a stochastic whole-brain and body simulator of the nematode roundworm Caenorhabditis elegans (C. elegans) and show that it is sufficiently regularizing to allow imputation of latent membrane potentials from partial calcium fluorescence imaging observations. This is the first attempt we know of to "complete the circle," where an anatomically grounded whole-connectome simulator is used to impute a time-varying "brain" state at single-cell fidelity from covariates that are measurable in practice. The sequential Monte Carlo (SMC) method we employ not only enables imputation of said latent states but also presents a strategy for learning simulator parameters via variational optimization of the noisy model evidence approximation provided by SMC. Our imputation and parameter estimation experiments were conducted on distributed systems using novel implementations of the aforementioned techniques applied to synthetic data of dimension and type representative of that which are measured in laboratories currently.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Amortized Monte Carlo Integration
Authors:
Adam Goliński,
Frank Wood,
Tom Rainforth
Abstract:
Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing…
▽ More
Current approaches to amortizing Bayesian inference focus solely on approximating the posterior distribution. Typically, this approximation is, in turn, used to calculate expectations for one or more target functions - a computational pipeline which is inefficient when the target function(s) are known upfront. In this paper, we address this inefficiency by introducing AMCI, a method for amortizing Monte Carlo integration directly. AMCI operates similarly to amortized inference but produces three distinct amortized proposals, each tailored to a different component of the overall expectation calculation. At runtime, samples are produced separately from each amortized proposal, before being combined to an overall estimate of the expectation. We show that while existing approaches are fundamentally limited in the level of accuracy they can achieve, AMCI can theoretically produce arbitrarily small errors for any integrable target function using only a single sample from each proposal at runtime. We further show that it is able to empirically outperform the theoretically optimal self-normalized importance sampler on a number of example problems. Furthermore, AMCI allows not only for amortizing over datasets but also amortizing over target functions.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at Scale
Authors:
Atılım Güneş Baydin,
Lei Shao,
Wahid Bhimji,
Lukas Heinrich,
Lawrence Meadows,
Jialin Liu,
Andreas Munk,
Saeid Naderiparizi,
Bradley Gram-Hansen,
Gilles Louppe,
Mingfei Ma,
Xiaohui Zhao,
Philip Torr,
Victor Lee,
Kyle Cranmer,
Prabhat,
Frank Wood
Abstract:
Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL frame…
▽ More
Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific simulators in a PPL, the computational cost of inference, and the lack of scalable implementations. To address these, we present a novel PPL framework that couples directly to existing scientific simulators through a cross-platform probabilistic execution protocol and provides Markov chain Monte Carlo (MCMC) and deep-learning-based inference compilation (IC) engines for tractable inference. To guide IC inference, we perform distributed training of a dynamic 3DCNN--LSTM architecture with a PyTorch-MPI-based framework on 1,024 32-core CPU nodes of the Cori supercomputer with a global minibatch size of 128k: achieving a performance of 450 Tflop/s through enhancements to PyTorch. We demonstrate a Large Hadron Collider (LHC) use-case with the C++ Sherpa simulator and achieve the largest-scale posterior inference in a Turing-complete PPL.
△ Less
Submitted 27 August, 2019; v1 submitted 7 July, 2019;
originally announced July 2019.
-
The Thermodynamic Variational Objective
Authors:
Vaden Masrani,
Tuan Anh Le,
Frank Wood
Abstract:
We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational variational evidence lower bound (ELBO) while remaining as broadly applicabl…
▽ More
We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational variational evidence lower bound (ELBO) while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We use the TVO to learn both discrete and continuous deep generative models and empirically demonstrate state of the art model and inference network learning.
△ Less
Submitted 7 April, 2021; v1 submitted 28 June, 2019;
originally announced July 2019.
-
Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training
Authors:
William Harvey,
Michael Teng,
Frank Wood
Abstract:
Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. Hard attention mechanisms are typically non-differentiable. They can be trained with reinforcement learning but the high-variance training this entails hinders more widespread application. We show how hard attention for image classification can be framed as a Bayesian optimal e…
▽ More
Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. Hard attention mechanisms are typically non-differentiable. They can be trained with reinforcement learning but the high-variance training this entails hinders more widespread application. We show how hard attention for image classification can be framed as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour, and use it to generate `near-optimal' sequences of attention locations. We then show how to use such sequences to partially supervise, and therefore speed up, the training of a hard attention mechanism. Although generating these sequences is computationally expensive, they can be reused by any other networks later trained on the same task.
△ Less
Submitted 14 June, 2020; v1 submitted 12 June, 2019;
originally announced June 2019.
-
LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models
Authors:
Yuan Zhou,
Bradley J. Gram-Hansen,
Tobias Kohn,
Tom Rainforth,
Hongseok Yang,
Frank Wood
Abstract:
We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for b…
▽ More
We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
An Introduction to Probabilistic Programming
Authors:
Jan-Willem van de Meent,
Brooks Paige,
Hongseok Yang,
Frank Wood
Abstract:
This book is a graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming la…
▽ More
This book is a graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.
We start with a discussion of model-based reasoning and explain why conditioning is a foundational computation central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a first-order probabilistic programming language (PPL) whose programs correspond to graphical models with a known, finite, set of random variables. In the context of this PPL we introduce fundamental inference algorithms and describe how they can be implemented.
We then turn to higher-order probabilistic programming languages. Programs in such languages can define models with dynamic computation graphs, which may not instantiate the same set of random variables in each execution. Inference requires methods that generate samples by repeatedly evaluating the program. Foundational algorithms for this kind of language are discussed in the context of an interface between program executions and an inference controller.
Finally we consider the intersection of probabilistic and differentiable programming. We begin with a discussion of automatic differentiation, and how it can be used to implement efficient inference methods based on Hamiltonian Monte Carlo. We then discuss gradient-based maximum likelihood estimation in programs that are parameterized using neural networks, how to amortize inference using by learning neural approximations to the program posterior, and how language features impact the design of deep probabilistic programming systems.
△ Less
Submitted 19 October, 2021; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
Authors:
Atılım Güneş Baydin,
Lukas Heinrich,
Wahid Bhimji,
Lei Shao,
Saeid Naderiparizi,
Andreas Munk,
Jialin Liu,
Bradley Gram-Hansen,
Gilles Louppe,
Lawrence Meadows,
Philip Torr,
Victor Lee,
Prabhat,
Kyle Cranmer,
Frank Wood
Abstract:
We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable po…
▽ More
We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.
△ Less
Submitted 17 February, 2020; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Inference Trees: Adaptive Inference with Exploration
Authors:
Tom Rainforth,
Yuan Zhou,
Xiaoyu Lu,
Yee Whye Teh,
Frank Wood,
Hongseok Yang,
Jan-Willem van de Meent
Abstract:
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partiti…
▽ More
We introduce inference trees (ITs), a new class of inference methods that build on ideas from Monte Carlo tree search to perform adaptive sampling in a manner that balances exploration with exploitation, ensures consistency, and alleviates pathologies in existing adaptive methods. ITs adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner. This enables ITs to not only identify regions of high posterior mass, but also maintain uncertainty estimates to track regions where significant posterior mass may have been missed. ITs can be based on any inference method that provides a consistent estimate of the marginal likelihood. They are particularly effective when combined with sequential Monte Carlo, where they capture long-range dependencies and yield improvements beyond proposal adaptation alone.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Deep Variational Reinforcement Learning for POMDPs
Authors:
Maximilian Igl,
Luisa Zintgraf,
Tuan Anh Le,
Frank Wood,
Shimon Whiteson
Abstract:
Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bia…
▽ More
Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow
Authors:
Tuan Anh Le,
Adam R. Kosiorek,
N. Siddharth,
Yee Whye Teh,
Frank Wood
Abstract:
Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations---which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching…
▽ More
Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations---which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) (Bornschein and Bengio, 2015) algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs.
△ Less
Submitted 16 September, 2019; v1 submitted 26 May, 2018;
originally announced May 2018.
-
Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities
Authors:
Bradley Gram-Hansen,
Yuan Zhou,
Tobias Kohn,
Tom Rainforth,
Hongseok Yang,
Frank Wood
Abstract:
Hamiltonian Monte Carlo (HMC) is arguably the dominant statistical inference algorithm used in most popular "first-order differentiable" Probabilistic Programming Languages (PPLs). However, the fact that HMC uses derivative information causes complications when the target distribution is non-differentiable with respect to one or more of the latent variables. In this paper, we show how to use exten…
▽ More
Hamiltonian Monte Carlo (HMC) is arguably the dominant statistical inference algorithm used in most popular "first-order differentiable" Probabilistic Programming Languages (PPLs). However, the fact that HMC uses derivative information causes complications when the target distribution is non-differentiable with respect to one or more of the latent variables. In this paper, we show how to use extensions to HMC to perform inference in probabilistic programs that contain discontinuities. To do this, we design a Simple first-order Probabilistic Programming Language (SPPL) that contains a sufficient set of language restrictions together with a compilation scheme. This enables us to preserve both the statistical and syntactic interpretation of if-else statements in the probabilistic program, within the scope of first-order PPLs. We also provide a corresponding mathematical formalism that ensures any joint density denoted in such a language has a suitably low measure of discontinuities.
△ Less
Submitted 2 January, 2019; v1 submitted 7 April, 2018;
originally announced April 2018.
-
High Throughput Synchronous Distributed Stochastic Gradient Descent
Authors:
Michael Teng,
Frank Wood
Abstract:
We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to perform joint posterior predictive inference of the mini-batch gradient computation times of all worker-nodes in a parallel computing cluster. We show that a synchron…
▽ More
We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to perform joint posterior predictive inference of the mini-batch gradient computation times of all worker-nodes in a parallel computing cluster. We show that a synchronous parameter server can, by utilizing such a model, choose an optimal cutoff time beyond which mini-batch gradient messages from slow workers are ignored that maximizes overall mini-batch gradient computations per second. In keeping with earlier findings we observe that, under realistic conditions, eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but actually increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness. The principal novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance to dynamically adjust the cutoff improves substantially over the static-cutoff prior art, leading to, among other things, significantly reduced deep neural net training times on large computer clusters.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Tighter Variational Bounds are Not Necessarily Better
Authors:
Tom Rainforth,
Adam R. Kosiorek,
Tuan Anh Le,
Chris J. Maddison,
Maximilian Igl,
Frank Wood,
Yee Whye Teh
Abstract:
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization sc…
▽ More
We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.
△ Less
Submitted 5 March, 2019; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Faithful Inversion of Generative Models for Effective Amortized Inference
Authors:
Stefan Webb,
Adam Golinski,
Robert Zinkov,
N. Siddharth,
Tom Rainforth,
Yee Whye Teh,
Frank Wood
Abstract:
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency stru…
▽ More
Inference amortization methods share information across multiple posterior-inference problems, allowing each to be carried out more efficiently. Generally, they require the inversion of the dependency structure in the generative model, as the modeller must learn a mapping from observations to distributions approximating the posterior. Previous approaches have involved inverting the dependency structure in a heuristic way that fails to capture these dependencies correctly, thereby limiting the achievable accuracy of the resulting approximations. We introduce an algorithm for faithfully, and minimally, inverting the graphical model structure of any generative model. Such inverses have two crucial properties: (a) they do not encode any independence assertions that are absent from the model and; (b) they are local maxima for the number of true independencies encoded. We prove the correctness of our approach and empirically show that the resulting minimally faithful inverses lead to better inference amortization than existing heuristic approaches.
△ Less
Submitted 29 November, 2018; v1 submitted 1 December, 2017;
originally announced December 2017.
-
Updating the VESICLE-CNN Synapse Detector
Authors:
Andrew Warrington,
Frank Wood
Abstract:
We present an updated version of the VESICLE-CNN algorithm presented by Roncal et al. (2014). The original implementation makes use of a patch-based approach. This methodology is known to be slow due to repeated computations. We update this implementation to be fully convolutional through the use of dilated convolutions, recovering the expanded field of view achieved through the use of strided max…
▽ More
We present an updated version of the VESICLE-CNN algorithm presented by Roncal et al. (2014). The original implementation makes use of a patch-based approach. This methodology is known to be slow due to repeated computations. We update this implementation to be fully convolutional through the use of dilated convolutions, recovering the expanded field of view achieved through the use of strided maxpools, but without a degradation of spatial resolution. This updated implementation performs as well as the original implementation, but with a $600\times$ speedup at test time. We release source code and data into the public domain.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
On Nesting Monte Carlo Estimators
Authors:
Tom Rainforth,
Robert Cornish,
Hongseok Yang,
Andrew Warrington,
Frank Wood
Abstract:
Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of n…
▽ More
Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of nesting, and establish the conditions under which they converge. We derive corresponding rates of convergence and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. We demonstrate the applicability of our work by using our results to develop a new estimator for discrete Bayesian experimental design problems and derive error bounds for a class of variational objectives.
△ Less
Submitted 23 May, 2018; v1 submitted 18 September, 2017;
originally announced September 2017.
-
Bayesian Optimization for Probabilistic Programs
Authors:
Tom Rainforth,
Tuan Anh Le,
Jan-Willem van de Meent,
Michael A. Osborne,
Frank Wood
Abstract:
We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimiz…
▽ More
We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction; delivering significant performance improvements over prominent existing packages. We present applications of our method to a number of tasks including engineering design and parameter optimization.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Learning Disentangled Representations with Semi-Supervised Deep Generative Models
Authors:
N. Siddharth,
Brooks Paige,
Jan-Willem van de Meent,
Alban Desmaison,
Noah D. Goodman,
Pushmeet Kohli,
Frank Wood,
Philip H. S. Torr
Abstract:
Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectur…
▽ More
Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.
△ Less
Submitted 13 November, 2017; v1 submitted 1 June, 2017;
originally announced June 2017.
-
Auto-Encoding Sequential Monte Carlo
Authors:
Tuan Anh Le,
Maximilian Igl,
Tom Rainforth,
Tom Jin,
Frank Wood
Abstract:
We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models. Our approach relies on the efficiency of sequential Monte Carlo (SMC) for performing inference in structured probabilistic models and the flexibility of deep neural networks to mod…
▽ More
We build on auto-encoding sequential Monte Carlo (AESMC): a method for model and proposal learning based on maximizing the lower bound to the log marginal likelihood in a broad family of structured probabilistic models. Our approach relies on the efficiency of sequential Monte Carlo (SMC) for performing inference in structured probabilistic models and the flexibility of deep neural networks to model complex conditional probability distributions. We develop additional theoretical insights and introduce a new training procedure which improves both model and proposal learning. We demonstrate that our approach provides a fast, easy-to-implement and scalable means for simultaneous model learning and proposal adaptation in deep generative models.
△ Less
Submitted 5 April, 2018; v1 submitted 29 May, 2017;
originally announced May 2017.
-
Online Learning Rate Adaptation with Hypergradient Descent
Authors:
Atilim Gunes Baydin,
Robert Cornish,
David Martinez Rubio,
Mark Schmidt,
Frank Wood
Abstract:
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manua…
▽ More
We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.
△ Less
Submitted 25 February, 2018; v1 submitted 14 March, 2017;
originally announced March 2017.
-
Using Synthetic Data to Train Neural Networks is Model-Based Reasoning
Authors:
Tuan Anh Le,
Atilim Gunes Baydin,
Robert Zinkov,
Frank Wood
Abstract:
We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where…
▽ More
We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
On the Pitfalls of Nested Monte Carlo
Authors:
Tom Rainforth,
Robert Cornish,
Hongseok Yang,
Frank Wood
Abstract:
There is an increasing interest in estimating expectations outside of the classical inference framework, such as for models expressed as probabilistic programs. Many of these contexts call for some form of nested inference to be applied. In this paper, we analyse the behaviour of nested Monte Carlo (NMC) schemes, for which classical convergence proofs are insufficient. We give conditions under whi…
▽ More
There is an increasing interest in estimating expectations outside of the classical inference framework, such as for models expressed as probabilistic programs. Many of these contexts call for some form of nested inference to be applied. In this paper, we analyse the behaviour of nested Monte Carlo (NMC) schemes, for which classical convergence proofs are insufficient. We give conditions under which NMC will converge, establish a rate of convergence, and provide empirical data that suggests that this rate is observable in practice. Finally, we prove that general-purpose nested inference schemes are inherently biased. Our results serve to warn of the dangers associated with naive composition of inference and models.
△ Less
Submitted 3 December, 2016;
originally announced December 2016.
-
Inducing Interpretable Representations with Variational Autoencoders
Authors:
N. Siddharth,
Brooks Paige,
Alban Desmaison,
Jan-Willem Van de Meent,
Frank Wood,
Noah D. Goodman,
Pushmeet Kohli,
Philip H. S. Torr
Abstract:
We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy,…
▽ More
We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.
-
Probabilistic structure discovery in time series data
Authors:
David Janz,
Brooks Paige,
Tom Rainforth,
Jan-Willem van de Meent,
Frank Wood
Abstract:
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty esti…
▽ More
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.
△ Less
Submitted 21 November, 2016;
originally announced November 2016.
-
Inference Compilation and Universal Probabilistic Programming
Authors:
Tuan Anh Le,
Atilim Gunes Baydin,
Frank Wood
Abstract:
We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference…
▽ More
We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.
△ Less
Submitted 2 March, 2017; v1 submitted 31 October, 2016;
originally announced October 2016.
-
Inference Networks for Sequential Monte Carlo in Graphical Models
Authors:
Brooks Paige,
Frank Wood
Abstract:
We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a con…
▽ More
We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings.
△ Less
Submitted 7 March, 2018; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Interacting Particle Markov Chain Monte Carlo
Authors:
Tom Rainforth,
Christian A. Naesseth,
Fredrik Lindsten,
Brooks Paige,
Jan-Willem van de Meent,
Arnaud Doucet,
Frank Wood
Abstract:
We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing rates relative to both non-interacting PMCMC samplers, and a si…
▽ More
We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing rates relative to both non-interacting PMCMC samplers, and a single PMCMC sampler with an equivalent memory and computational budget. An additional advantage of the iPMCMC method is that it is suitable for distributed and multi-core architectures.
△ Less
Submitted 12 April, 2017; v1 submitted 16 February, 2016;
originally announced February 2016.