-
BALI: Learning Neural Networks via Bayesian Layerwise Inference
Authors:
Richard Kurle,
Alexej Klushyn,
Ralf Herbrich
Abstract:
We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated gradients of the objective function. The resulting…
▽ More
We introduce a new method for learning Bayesian neural networks, treating them as a stack of multivariate Bayesian linear regression models. The main idea is to infer the layerwise posterior exactly if we know the target outputs of each layer. We define these pseudo-targets as the layer outputs from the forward pass, updated by the backpropagated gradients of the objective function. The resulting layerwise posterior is a matrix-normal distribution with a Kronecker-factorized covariance matrix, which can be efficiently inverted. Our method extends to the stochastic mini-batch setting using an exponential moving average over natural-parameter terms, thus gradually forgetting older data. The method converges in few iterations and performs as well as or better than leading Bayesian neural network methods on various regression, classification, and out-of-distribution detection benchmarks.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
Learning Flat Latent Manifolds with VAEs
Authors:
Nutan Chen,
Alexej Klushyn,
Francesco Ferroni,
Justin Bayer,
Patrick van der Smagt
Abstract:
Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captur…
▽ More
Measuring the similarity between data points often requires domain knowledge, which can in parts be compensated by relying on unsupervised methods such as latent-variable models, where similarity/distance is estimated in a more compact latent space. Prevalent is the use of the Euclidean metric, which has the drawback of ignoring information about similarity of data stored in the decoder, as captured by the framework of Riemannian geometry. We propose an extension to the framework of variational auto-encoders allows learning flat latent manifolds, where the Euclidean metric is a proxy for the similarity between data points. This is achieved by defining the latent space as a Riemannian manifold and by regularising the metric tensor to be a scaled identity matrix. Additionally, we replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one---and formulate the learning problem as a constrained optimisation problem. We evaluate our method on a range of data-sets, including a video-tracking benchmark, where the performance of our unsupervised approach nears that of state-of-the-art supervised approaches, while retaining the computational efficiency of straight-line-based approaches.
△ Less
Submitted 12 August, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Increasing the Generalisation Capacity of Conditional VAEs
Authors:
Alexej Klushyn,
Nutan Chen,
Botond Cseke,
Justin Bayer,
Patrick van der Smagt
Abstract:
We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation c…
▽ More
We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability.
△ Less
Submitted 10 September, 2019; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Learning Hierarchical Priors in VAEs
Authors:
Alexej Klushyn,
Nutan Chen,
Richard Kurle,
Botond Cseke,
Patrick van der Smagt
Abstract:
We propose to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution. To incentivise an informative latent representation of the data, we formulate the learning problem as a constrained optimisation problem by extending the Taming VAEs framework to two-level hierarchical models. We introduce a graph…
▽ More
We propose to learn a hierarchical prior in the context of variational autoencoders to avoid the over-regularisation resulting from a standard normal prior distribution. To incentivise an informative latent representation of the data, we formulate the learning problem as a constrained optimisation problem by extending the Taming VAEs framework to two-level hierarchical models. We introduce a graph-based interpolation method, which shows that the topology of the learned latent representation corresponds to the topology of the data manifold---and present several examples, where desired properties of latent representation such as smoothness and simple explanatory factors are learned by the prior.
△ Less
Submitted 5 October, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Fast Approximate Geodesics for Deep Generative Models
Authors:
Nutan Chen,
Francesco Ferroni,
Alexej Klushyn,
Alexandros Paraschos,
Justin Bayer,
Patrick van der Smagt
Abstract:
The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate…
▽ More
The length of the geodesic between two data points along a Riemannian manifold, induced by a deep generative model, yields a principled measure of similarity. Current approaches are limited to low-dimensional latent spaces, due to the computational complexity of solving a non-convex optimisation problem. We propose finding shortest paths in a finite graph of samples from the aggregate approximate posterior, that can be solved exactly, at greatly reduced runtime, and without a notable loss in quality. Our approach, therefore, is hence applicable to high-dimensional problems, e.g., in the visual domain. We validate our approach empirically on a series of experiments using variational autoencoders applied to image data, including the Chair, FashionMNIST, and human movement data sets.
△ Less
Submitted 23 May, 2019; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Active Learning based on Data Uncertainty and Model Sensitivity
Authors:
Nutan Chen,
Alexej Klushyn,
Alexandros Paraschos,
Djalel Benbouzid,
Patrick van der Smagt
Abstract:
Robots can rapidly acquire new skills from demonstrations. However, during generalisation of skills or transitioning across fundamentally different skills, it is unclear whether the robot has the necessary knowledge to perform the task. Failing to detect missing information often leads to abrupt movements or to collisions with the environment. Active learning can quantify the uncertainty of perfor…
▽ More
Robots can rapidly acquire new skills from demonstrations. However, during generalisation of skills or transitioning across fundamentally different skills, it is unclear whether the robot has the necessary knowledge to perform the task. Failing to detect missing information often leads to abrupt movements or to collisions with the environment. Active learning can quantify the uncertainty of performing the task and, in general, locate regions of missing information. We introduce a novel algorithm for active learning and demonstrate its utility for generating smooth trajectories. Our approach is based on deep generative models and metric learning in latent spaces. It relies on the Jacobian of the likelihood to detect non-smooth transitions in the latent space, i.e., transitions that lead to abrupt changes in the movement of the robot. When non-smooth transitions are detected, our algorithm asks for an additional demonstration from that specific region. The newly acquired knowledge modifies the data manifold and allows for learning a latent representation for generating smooth movements. We demonstrate the efficacy of our approach on generalising elementary skills, transitioning across different skills, and implicitly avoiding collisions with the environment. For our experiments, we use a simulated pendulum where we observe its motion from images and a 7-DoF anthropomorphic arm.
△ Less
Submitted 6 August, 2018;
originally announced August 2018.
-
Metrics for Deep Generative Models
Authors:
Nutan Chen,
Alexej Klushyn,
Richard Kurle,
Xueyan Jiang,
Justin Bayer,
Patrick van der Smagt
Abstract:
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training crite…
▽ More
Neural samplers such as variational autoencoders (VAEs) or generative adversarial networks (GANs) approximate distributions by transforming samples from a simple random source---the latent space---to samples from a more complex distribution represented by a dataset. While the manifold hypothesis implies that the density induced by a dataset contains large regions of low density, the training criterions of VAEs and GANs will make the latent space densely covered. Consequently points that are separated by low-density regions in observation space will be pushed together in latent space, making stationary distances poor proxies for similarity. We transfer ideas from Riemannian geometry to this setting, letting the distance between two points be the shortest path on a Riemannian manifold induced by the transformation. The method yields a principled distance measure, provides a tool for visual inspection of deep generative models, and an alternative to linear interpolation in latent space. In addition, it can be applied for robot movement generalization using previously learned skills. The method is evaluated on a synthetic dataset with known ground truth; on a simulated robot arm dataset; on human motion capture data; and on a generative model of handwritten digits.
△ Less
Submitted 8 February, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.