-
Artificial Kuramoto Oscillatory Neurons
Authors:
Takeru Miyato,
Sindy Löwe,
Andreas Geiger,
Max Welling
Abstract:
It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network. More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas…
▽ More
It has long been known in both neuroscience and AI that ``binding'' between neurons leads to a form of competitive learning where representations are compressed in order to represent more abstract concepts in deeper layers of the network. More recently, it was also hypothesized that dynamic (spatiotemporal) representations play an important role in both neuroscience and AI. Building on these ideas, we introduce Artificial Kuramoto Oscillatory Neurons (AKOrN) as a dynamical alternative to threshold units, which can be combined with arbitrary connectivity designs such as fully connected, convolutional, or attentive mechanisms. Our generalized Kuramoto updates bind neurons together through their synchronization dynamics. We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning. We believe that these empirical results show the importance of rethinking our assumptions at the most basic neuronal level of neural representation, and in particular show the importance of dynamical representations. Code:https://github.com/autonomousvision/akorn Project page:https://takerum.github.io/akorn_project_page/
△ Less
Submitted 16 May, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
GUD: Generation with Unified Diffusion
Authors:
Mathis Gerdes,
Max Welling,
Miranda C. N. Cheng
Abstract:
Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fouri…
▽ More
Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples. Inspired by concepts from the renormalization group in physics, which analyzes systems across different scales, we revisit diffusion models by exploring three key design aspects: 1) the choice of representation in which the diffusion process operates (e.g. pixel-, PCA-, Fourier-, or wavelet-basis), 2) the prior distribution that data is transformed into during diffusion (e.g. Gaussian with covariance $Σ$), and 3) the scheduling of noise levels applied separately to different parts of the data, captured by a component-wise noise schedule. Incorporating the flexibility in these choices, we develop a unified framework for diffusion generative models with greatly enhanced design freedom. In particular, we introduce soft-conditioning models that smoothly interpolate between standard diffusion models and autoregressive models (in any basis), conceptually bridging these two approaches. Our framework opens up a wide design space which may lead to more efficient training and data generation, and paves the way to novel architectures integrating different generative approaches and generation tasks.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Variational Flow Matching for Graph Generation
Authors:
Floor Eijkelboom,
Grigory Bartosh,
Christian Andersson Naesseth,
Max Welling,
Jan-Willem van de Meent
Abstract:
We present a formulation of flow matching as variational inference, which we refer to as variational flow matching (VFM). Based on this formulation we develop CatFlow, a flow matching method for categorical data. CatFlow is easy to implement, computationally efficient, and achieves strong results on graph generation tasks. In VFM, the objective is to approximate the posterior probability path, whi…
▽ More
We present a formulation of flow matching as variational inference, which we refer to as variational flow matching (VFM). Based on this formulation we develop CatFlow, a flow matching method for categorical data. CatFlow is easy to implement, computationally efficient, and achieves strong results on graph generation tasks. In VFM, the objective is to approximate the posterior probability path, which is a distribution over possible end points of a trajectory. We show that VFM admits both the CatFlow objective and the original flow matching objective as special cases. We also relate VFM to score-based models, in which the dynamics are stochastic rather than deterministic, and derive a bound on the model likelihood based on a reweighted VFM objective. We evaluate CatFlow on one abstract graph generation task and two molecular generation tasks. In all cases, CatFlow exceeds or matches performance of the current state-of-the-art models.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Authors:
Theodore Papamarkou,
Maria Skoularidou,
Konstantina Palla,
Laurence Aitchison,
Julyan Arbel,
David Dunson,
Maurizio Filippone,
Vincent Fortuin,
Philipp Hennig,
José Miguel Hernández-Lobato,
Aliaksandr Hubin,
Alexander Immer,
Theofanis Karaletsos,
Mohammad Emtiyaz Khan,
Agustinus Kristiadi,
Yingzhen Li,
Stephan Mandt,
Christopher Nemeth,
Michael A. Osborne,
Tim G. J. Rudner,
David Rügamer,
Yee Whye Teh,
Max Welling,
Andrew Gordon Wilson,
Ruqi Zhang
Abstract:
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni…
▽ More
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.
△ Less
Submitted 6 August, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers
Authors:
Takeru Miyato,
Bernhard Jaeger,
Max Welling,
Andreas Geiger
Abstract:
As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional enc…
▽ More
As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional encoding schemes are suboptimal for 3D vision tasks, as they do not respect their underlying 3D geometric structure. Based on this hypothesis, we propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation determined by the geometric relationship between queries and key-value pairs. By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead.
△ Less
Submitted 7 June, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Equivariant Diffusion for Molecule Generation in 3D
Authors:
Emiel Hoogeboom,
Victor Garcia Satorras,
Clément Vignac,
Max Welling
Abstract:
This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood…
▽ More
This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
△ Less
Submitted 16 June, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Particle Dynamics for Learning EBMs
Authors:
Kirill Neklyudov,
Priyank Jaini,
Max Welling
Abstract:
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such samplin…
▽ More
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current density model. Thereby we match the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically compared to MCMC-based learning methods.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
An Expectation-Maximization Perspective on Federated Learning
Authors:
Christos Louizos,
Matthias Reisser,
Joseph Soriaga,
Max Welling
Abstract:
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device. In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters. We show that with simple Gaussian priors and a ha…
▽ More
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device. In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters. We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting. This perspective on FedAvg unifies several recent works in the field and opens up the possibility for extensions through different choices for the hierarchical model. Based on this view, we further propose a variant of the hierarchical model that employs prior distributions to promote sparsity. By similarly using the hard-EM algorithm for learning, we obtain FedSparse, a procedure that can learn sparse neural networks in the federated learning setting. FedSparse reduces communication costs from client to server and vice-versa, as well as the computational costs for inference with the sparsified network - both of which are of great practical importance in federated learning.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
Geometric and Physical Quantities Improve E(3) Equivariant Message Passing
Authors:
Johannes Brandstetter,
Rob Hesselink,
Elise van der Pol,
Erik J Bekkers,
Max Welling
Abstract:
Including covariant information, such as position, force, velocity or spin is important in many tasks in computational physics and chemistry. We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks, such that node and edge attributes are not restricted to invariant scalars, but can contain covariant information, such as vectors or tensors.…
▽ More
Including covariant information, such as position, force, velocity or spin is important in many tasks in computational physics and chemistry. We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks, such that node and edge attributes are not restricted to invariant scalars, but can contain covariant information, such as vectors or tensors. This model, composed of steerable MLPs, is able to incorporate geometric and physical information in both the message and update functions. Through the definition of steerable node attributes, the MLPs provide a new class of activation functions for general use with steerable feature fields. We discuss ours and related work through the lens of equivariant non-linear convolutions, which further allows us to pin-point the successful components of SEGNNs: non-linear message aggregation improves upon classic linear (steerable) point convolutions; steerable messages improve upon recent equivariant graph networks that send invariant messages. We demonstrate the effectiveness of our method on several tasks in computational physics and chemistry and provide extensive ablation studies.
△ Less
Submitted 26 March, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
Neural Augmentation of Kalman Filter with Hypernetwork for Channel Tracking
Authors:
Kumar Pratik,
Rana Ali Amjad,
Arash Behboodi,
Joseph B. Soriaga,
Max Welling
Abstract:
We propose Hypernetwork Kalman Filter (HKF) for tracking applications with multiple different dynamics. The HKF combines generalization power of Kalman filters with expressive power of neural networks. Instead of keeping a bank of Kalman filters and choosing one based on approximating the actual dynamics, HKF adapts itself to each dynamics based on the observed sequence. Through extensive experime…
▽ More
We propose Hypernetwork Kalman Filter (HKF) for tracking applications with multiple different dynamics. The HKF combines generalization power of Kalman filters with expressive power of neural networks. Instead of keeping a bank of Kalman filters and choosing one based on approximating the actual dynamics, HKF adapts itself to each dynamics based on the observed sequence. Through extensive experiments on CDL-B channel model, we show that the HKF can be used for tracking the channel over a wide range of Doppler values, matching Kalman filter performance with genie Doppler information. At high Doppler values, it achieves around 2dB gain over genie Kalman filter. The HKF generalizes well to unseen Doppler, SNR values and pilot patterns unlike LSTM, which suffers from severe performance degradation.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Deterministic Gibbs Sampling via Ordinary Differential Equations
Authors:
Kirill Neklyudov,
Roberto Bondesan,
Max Welling
Abstract:
Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonst…
▽ More
Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonstrate the utility of our approach by constructing a continuous non-sequential version of Gibbs sampling in terms of an ODE flow and extending it to discrete state spaces. We find that our deterministic samplers are more sample efficient than stochastic counterparts, even if the latter generate independent samples.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
Authors:
Priyank Jaini,
Lars Holdijk,
Max Welling
Abstract:
We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through…
▽ More
We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, we define equivariant energy based models to model invariant densities that are learned using contrastive divergence. By utilizing our equivariant SVGD for training equivariant EBMs, we propose new ways of improving and scaling up training of energy based models. We apply these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation.
△ Less
Submitted 29 July, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Coordinate Independent Convolutional Networks -- Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds
Authors:
Maurice Weiler,
Patrick Forré,
Erik Verlinde,
Max Welling
Abstract:
Motivated by the vast success of deep convolutional networks, there is a great interest in generalizing convolutions to non-Euclidean manifolds. A major complication in comparison to flat spaces is that it is unclear in which alignment a convolution kernel should be applied on a manifold. The underlying reason for this ambiguity is that general manifolds do not come with a canonical choice of refe…
▽ More
Motivated by the vast success of deep convolutional networks, there is a great interest in generalizing convolutions to non-Euclidean manifolds. A major complication in comparison to flat spaces is that it is unclear in which alignment a convolution kernel should be applied on a manifold. The underlying reason for this ambiguity is that general manifolds do not come with a canonical choice of reference frames (gauge). Kernels and features therefore have to be expressed relative to arbitrary coordinates. We argue that the particular choice of coordinatization should not affect a network's inference -- it should be coordinate independent. A simultaneous demand for coordinate independence and weight sharing is shown to result in a requirement on the network to be equivariant under local gauge transformations (changes of local reference frames). The ambiguity of reference frames depends thereby on the G-structure of the manifold, such that the necessary level of gauge equivariance is prescribed by the corresponding structure group G. Coordinate independent convolutions are proven to be equivariant w.r.t. those isometries that are symmetries of the G-structure. The resulting theory is formulated in a coordinate free fashion in terms of fiber bundles. To exemplify the design of coordinate independent convolutions, we implement a convolutional network on the Möbius strip. The generality of our differential geometric formulation of convolutional networks is demonstrated by an extensive literature review which explains a large number of Euclidean CNNs, spherical CNNs and CNNs on general surfaces as specific instances of coordinate independent convolutions.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
E(n) Equivariant Normalizing Flows
Authors:
Victor Garcia Satorras,
Emiel Hoogeboom,
Fabian B. Fuchs,
Ingmar Posner,
Max Welling
Abstract:
This paper introduces a generative model equivariant to Euclidean symmetries: E(n) Equivariant Normalizing Flows (E-NFs). To construct E-NFs, we take the discriminative E(n) graph neural networks and integrate them as a differential equation to obtain an invertible equivariant function: a continuous-time normalizing flow. We demonstrate that E-NFs considerably outperform baselines and existing met…
▽ More
This paper introduces a generative model equivariant to Euclidean symmetries: E(n) Equivariant Normalizing Flows (E-NFs). To construct E-NFs, we take the discriminative E(n) graph neural networks and integrate them as a differential equation to obtain an invertible equivariant function: a continuous-time normalizing flow. We demonstrate that E-NFs considerably outperform baselines and existing methods from the literature on particle systems such as DW4 and LJ13, and on molecules from QM9 in terms of log-likelihood. To the best of our knowledge, this is the first flow that jointly generates molecule features and positions in 3D.
△ Less
Submitted 14 January, 2022; v1 submitted 19 May, 2021;
originally announced May 2021.
-
A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups
Authors:
Marc Finzi,
Max Welling,
Andrew Gordon Wilson
Abstract:
Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering…
▽ More
Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering solutions from other works as special cases, we construct multilayer perceptrons equivariant to multiple groups that have never been tackled before, including $\mathrm{O}(1,3)$, $\mathrm{O}(5)$, $\mathrm{Sp}(n)$, and the Rubik's cube group. Our approach outperforms non-equivariant baselines, with applications to particle physics and dynamical systems. We release our software library to enable researchers to construct equivariant layers for arbitrary matrix groups.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Diagnosing Vulnerability of Variational Auto-Encoders to Adversarial Attacks
Authors:
Anna Kuzina,
Max Welling,
Jakub M. Tomczak
Abstract:
In this work, we explore adversarial attacks on the Variational Autoencoders (VAE). We show how to modify data point to obtain a prescribed latent code (supervised attack) or just get a drastically different code (unsupervised attack). We examine the influence of model modifications ($β$-VAE, NVAE) on the robustness of VAEs and suggest metrics to quantify it.
In this work, we explore adversarial attacks on the Variational Autoencoders (VAE). We show how to modify data point to obtain a prescribed latent code (supervised attack) or just get a drastically different code (unsupervised attack). We examine the influence of model modifications ($β$-VAE, NVAE) on the robustness of VAEs and suggest metrics to quantify it.
△ Less
Submitted 6 May, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Combining Interventional and Observational Data Using Causal Reductions
Authors:
Maximilian Ilse,
Patrick Forré,
Max Welling,
Joris M. Mooij
Abstract:
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a causal reduction method that, given a causal model, replaces an arbitrary number of possibly high-dimensional latent confounders with a single latent confounder that takes values in the same space as the treatment variable, without changing the observational and interventional distributions the causal…
▽ More
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a causal reduction method that, given a causal model, replaces an arbitrary number of possibly high-dimensional latent confounders with a single latent confounder that takes values in the same space as the treatment variable, without changing the observational and interventional distributions the causal model entails. This allows us to estimate the causal effect in a principled way from combined data without relying on the common but often unrealistic assumption that all confounders have been observed. We apply our causal reduction in three different settings. In the first setting, we assume the treatment and outcome to be discrete. The causal reduction then implies bounds between the observational and interventional distributions that can be exploited for estimation purposes. In certain cases with highly unbalanced observational samples, the accuracy of the causal effect estimate can be improved by incorporating observational data. Second, for continuous variables and assuming a linear-Gaussian model, we derive equality constraints for the parameters of the observational and interventional distributions. Third, for the general continuous setting (possibly nonlinear and non-Gaussian), we parameterize the reduced causal model using normalizing flows, a flexible class of easily invertible nonlinear transformations. We perform a series of experiments on synthetic data and find that in several cases the number of interventional samples can be reduced when adding observational training samples without sacrificing accuracy.
△ Less
Submitted 22 February, 2023; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel
Authors:
Changyong Oh,
Roberto Bondesan,
Efstratios Gavves,
Max Welling
Abstract:
In this work we propose a batch Bayesian optimization method for combinatorial problems on permutations, which is well suited for expensive-to-evaluate objectives. We first introduce LAW, an efficient batch acquisition method based on determinantal point processes using the acquisition weighted kernel. Relying on multiple parallel evaluations, LAW enables accelerated search on combinatorial spaces…
▽ More
In this work we propose a batch Bayesian optimization method for combinatorial problems on permutations, which is well suited for expensive-to-evaluate objectives. We first introduce LAW, an efficient batch acquisition method based on determinantal point processes using the acquisition weighted kernel. Relying on multiple parallel evaluations, LAW enables accelerated search on combinatorial spaces. We then apply the framework to permutation problems, which have so far received little attention in the Bayesian Optimization literature, despite their practical importance. We call this method LAW2ORDER. On the theoretical front, we prove that LAW2ORDER has vanishing simple regret by showing that the batch cumulative regret is sublinear. Empirically, we assess the method on several standard combinatorial problems involving permutations such as quadratic assignment, flowshop scheduling and the traveling salesman, as well as on a structure learning task.
△ Less
Submitted 25 January, 2023; v1 submitted 26 February, 2021;
originally announced February 2021.
-
Mixed Variable Bayesian Optimization with Frequency Modulated Kernels
Authors:
Changyong Oh,
Efstratios Gavves,
Max Welling
Abstract:
The sample efficiency of Bayesian optimization(BO) is often boosted by Gaussian Process(GP) surrogate models. However, on mixed variable spaces, surrogate models other than GPs are prevalent, mainly due to the lack of kernels which can model complex dependencies across different types of variables. In this paper, we propose the frequency modulated (FM) kernel flexibly modeling dependencies among d…
▽ More
The sample efficiency of Bayesian optimization(BO) is often boosted by Gaussian Process(GP) surrogate models. However, on mixed variable spaces, surrogate models other than GPs are prevalent, mainly due to the lack of kernels which can model complex dependencies across different types of variables. In this paper, we propose the frequency modulated (FM) kernel flexibly modeling dependencies among different types of variables, so that BO can enjoy the further improved sample efficiency. The FM kernel uses distances on continuous variables to modulate the graph Fourier spectrum derived from discrete variables. However, the frequency modulation does not always define a kernel with the similarity measure behavior which returns higher values for pairs of more similar points. Therefore, we specify and prove conditions for FM kernels to be positive definite and to exhibit the similarity measure behavior. In experiments, we demonstrate the improved sample efficiency of GP BO using FM kernels (BO-FM).On synthetic problems and hyperparameter optimization problems, BO-FM outperforms competitors consistently. Also, the importance of the frequency modulation principle is empirically demonstrated on the same problems. On joint optimization of neural architectures and SGD hyperparameters, BO-FM outperforms competitors including Regularized evolution(RE) and BOHB. Remarkably, BO-FM performs better even than RE and BOHB using three times as many evaluations.
△ Less
Submitted 18 July, 2022; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Deep Policy Dynamic Programming for Vehicle Routing Problems
Authors:
Wouter Kool,
Herke van Hoof,
Joaquim Gromicho,
Max Welling
Abstract:
Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims…
▽ More
Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms guarantee optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP), the vehicle routing problem (VRP) and TSP with time windows (TSPTW) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming most other 'neural approaches' for solving TSPs, VRPs and TSPTWs with 100 nodes.
△ Less
Submitted 2 December, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
E(n) Equivariant Graph Neural Networks
Authors:
Victor Garcia Satorras,
Emiel Hoogeboom,
Max Welling
Abstract:
This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)-Equivariant Graph Neural Networks (EGNNs). In contrast with existing methods, our work does not require computationally expensive higher-order representations in intermediate layers while it still achieves competitive or better performance. In addition,…
▽ More
This paper introduces a new model to learn graph neural networks equivariant to rotations, translations, reflections and permutations called E(n)-Equivariant Graph Neural Networks (EGNNs). In contrast with existing methods, our work does not require computationally expensive higher-order representations in intermediate layers while it still achieves competitive or better performance. In addition, whereas existing methods are limited to equivariance on 3 dimensional spaces, our model is easily scaled to higher-dimensional spaces. We demonstrate the effectiveness of our method on dynamical systems modelling, representation learning in graph autoencoders and predicting molecular properties.
△ Less
Submitted 16 February, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Authors:
Emiel Hoogeboom,
Didrik Nielsen,
Priyank Jaini,
Patrick Forré,
Max Welling
Abstract:
Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function.…
▽ More
Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.
△ Less
Submitted 22 October, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
Self Normalizing Flows
Authors:
T. Anderson Keller,
Jorn W. T. Peters,
Priyank Jaini,
Emiel Hoogeboom,
Patrick Forré,
Max Welling
Abstract:
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models,…
▽ More
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.
△ Less
Submitted 9 June, 2021; v1 submitted 14 November, 2020;
originally announced November 2020.
-
Orbital MCMC
Authors:
Kirill Neklyudov,
Max Welling
Abstract:
Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel consi…
▽ More
Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel considering arbitrary deterministic maps as proposal generators. When applied to the transformations with orbits of period two (involutions), the test reduces to the MHG test. Based on the derived test we propose two practical algorithms: one operates by constructing periodic orbits from any diffeomorphism, another on contractions of the state space (such as optimization trajectories). Finally, we perform an empirical study demonstrating the practical advantages of both kernels.
△ Less
Submitted 7 June, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Natural Graph Networks
Authors:
Pim de Haan,
Taco Cohen,
Max Welling
Abstract:
A key requirement for graph neural networks is that they must process a graph in a way that does not depend on how the graph is described. Traditionally this has been taken to mean that a graph network must be equivariant to node permutations. Here we show that instead of equivariance, the more general concept of naturality is sufficient for a graph network to be well-defined, opening up a larger…
▽ More
A key requirement for graph neural networks is that they must process a graph in a way that does not depend on how the graph is described. Traditionally this has been taken to mean that a graph network must be equivariant to node permutations. Here we show that instead of equivariance, the more general concept of naturality is sufficient for a graph network to be well-defined, opening up a larger class of graph networks. We define global and local natural graph networks, the latter of which are as scalable as conventional message passing graph neural networks while being more flexible. We give one practical instantiation of a natural network on graphs which uses an equivariant message network parameterization, yielding good performance on several benchmarks.
△ Less
Submitted 23 November, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Federated Learning of User Authentication Models
Authors:
Hossein Hosseini,
Sungrack Yun,
Hyunsin Park,
Christos Louizos,
Joseph Soriaga,
Max Welling
Abstract:
Machine learning-based User Authentication (UA) models have been widely deployed in smart devices. UA models are trained to map input data of different users to highly separable embedding vectors, which are then used to accept or reject new inputs at test time. Training UA models requires having direct access to the raw inputs and embedding vectors of users, both of which are privacy-sensitive inf…
▽ More
Machine learning-based User Authentication (UA) models have been widely deployed in smart devices. UA models are trained to map input data of different users to highly separable embedding vectors, which are then used to accept or reject new inputs at test time. Training UA models requires having direct access to the raw inputs and embedding vectors of users, both of which are privacy-sensitive information. In this paper, we propose Federated User Authentication (FedUA), a framework for privacy-preserving training of UA models. FedUA adopts federated learning framework to enable a group of users to jointly train a model without sharing the raw inputs. It also allows users to generate their embeddings as random binary vectors, so that, unlike the existing approach of constructing the spread out embeddings by the server, the embedding vectors are kept private as well. We show our method is privacy-preserving, scalable with number of users, and allows new users to be added to training without changing the output layer. Our experimental results on the VoxCeleb dataset for speaker verification shows our method reliably rejects data of unseen users at very high true positive rates.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
Authors:
Didrik Nielsen,
Priyank Jaini,
Emiel Hoogeboom,
Ole Winther,
Max Welling
Abstract:
Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood.…
▽ More
Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood. In this paper, we introduce SurVAE Flows: A modular framework of composable transformations that encompasses VAEs and normalizing flows. SurVAE Flows bridge the gap between normalizing flows and VAEs with surjective transformations, wherein the transformations are deterministic in one direction -- thereby allowing exact likelihood computation, and stochastic in the reverse direction -- hence providing a lower bound on the corresponding likelihood. We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows. Finally, we introduce common operations such as the max value, the absolute value, sorting and stochastic permutation as composable layers in SurVAE Flows.
△ Less
Submitted 30 October, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
RE-MIMO: Recurrent and Permutation Equivariant Neural MIMO Detection
Authors:
Kumar Pratik,
Bhaskar D. Rao,
Max Welling
Abstract:
In this paper, we present a novel neural network for MIMO symbol detection. It is motivated by several important considerations in wireless communication systems; permutation equivariance and a variable number of users. The neural detector learns an iterative decoding algorithm that is implemented as a stack of iterative units. Each iterative unit is a neural computation module comprising of 3 sub…
▽ More
In this paper, we present a novel neural network for MIMO symbol detection. It is motivated by several important considerations in wireless communication systems; permutation equivariance and a variable number of users. The neural detector learns an iterative decoding algorithm that is implemented as a stack of iterative units. Each iterative unit is a neural computation module comprising of 3 sub-modules: the likelihood module, the encoder module, and the predictor module. The likelihood module injects information about the generative (forward) process into the neural network. The encoder-predictor modules together update the state vector and symbol estimates. The encoder module updates the state vector and employs a transformer based attention network to handle the interactions among the users in a permutation equivariant manner. The predictor module refines the symbol estimates. The modular and permutation equivariant architecture allows for dealing with a varying number of users. The resulting neural detector architecture is unique and exhibits several desirable properties unseen in any of the previously proposed neural detectors. We compare its performance against existing methods and the results show the ability of our network to efficiently handle a variable number of transmitters with high accuracy.
△ Less
Submitted 23 January, 2021; v1 submitted 30 June, 2020;
originally announced July 2020.
-
MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
Authors:
Elise van der Pol,
Daniel E. Worrall,
Herke van Hoof,
Frans A. Oliehoek,
Max Welling
Abstract:
This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance con…
▽ More
This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.
△ Less
Submitted 20 January, 2021; v1 submitted 30 June, 2020;
originally announced June 2020.
-
Involutive MCMC: a Unifying Framework
Authors:
Kirill Neklyudov,
Max Welling,
Evgenii Egorov,
Dmitry Vetrov
Abstract:
Markov Chain Monte Carlo (MCMC) is a computational approach to fundamental problems such as inference, integration, optimization, and simulation. The field has developed a broad spectrum of algorithms, varying in the way they are motivated, the way they are applied and how efficiently they sample. Despite all the differences, many of them share the same core principle, which we unify as the Involu…
▽ More
Markov Chain Monte Carlo (MCMC) is a computational approach to fundamental problems such as inference, integration, optimization, and simulation. The field has developed a broad spectrum of algorithms, varying in the way they are motivated, the way they are applied and how efficiently they sample. Despite all the differences, many of them share the same core principle, which we unify as the Involutive MCMC (iMCMC) framework. Building upon this, we describe a wide range of MCMC algorithms in terms of iMCMC, and formulate a number of "tricks" which one can use as design principles for developing new MCMC algorithms. Thus, iMCMC provides a unified view of many known MCMC algorithms, which facilitates the derivation of powerful extensions. We demonstrate the latter with two examples where we transform known reversible MCMC algorithms into more efficient irreversible ones.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data
Authors:
Sindy Löwe,
David Madras,
Richard Zemel,
Max Welling
Abstract:
On time-series data, most causal discovery methods fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information which is lost when following this approach. Specifically, different samples may share the dynamics which describe the effects of their causal relations. We propose Amortized Causal Discovery, a novel framework…
▽ More
On time-series data, most causal discovery methods fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information which is lost when following this approach. Specifically, different samples may share the dynamics which describe the effects of their causal relations. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus leverages the shared dynamics information. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under added noise and hidden confounding.
△ Less
Submitted 21 February, 2022; v1 submitted 18 June, 2020;
originally announced June 2020.
-
SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
Authors:
Fabian B. Fuchs,
Daniel E. Worrall,
Volker Fischer,
Max Welling
Abstract:
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model. The SE(3)-Transfor…
▽ More
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model. The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds and graphs with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.
△ Less
Submitted 24 November, 2020; v1 submitted 18 June, 2020;
originally announced June 2020.
-
The Convolution Exponential and Generalized Sylvester Flows
Authors:
Emiel Hoogeboom,
Victor Garcia Satorras,
Jakub M. Tomczak,
Max Welling
Abstract:
This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation. This linear transformation does not need to be invertible itself, and the exponential has the following desirable properties: it is guaranteed to be invertible, its inverse is straightforward to compute and the log Jacobian determinant is equal to the trace of the linear transformation.…
▽ More
This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation. This linear transformation does not need to be invertible itself, and the exponential has the following desirable properties: it is guaranteed to be invertible, its inverse is straightforward to compute and the log Jacobian determinant is equal to the trace of the linear transformation. An important insight is that the exponential can be computed implicitly, which allows the use of convolutional layers. Using this insight, we develop new invertible transformations named convolution exponentials and graph convolution exponentials, which retain the equivariance of their underlying transformations. In addition, we generalize Sylvester Flows and propose Convolutional Sylvester Flows which are based on the generalization and the convolution exponential as basis change. Empirically, we show that the convolution exponential outperforms other linear transformations in generative flows on CIFAR10 and the graph convolution exponential improves the performance of graph normalizing flows. In addition, we show that Convolutional Sylvester Flows improve performance over residual flows as a generative flow model measured in log-likelihood.
△ Less
Submitted 26 October, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Bayesian Bits: Unifying Quantization and Pruning
Authors:
Mart van Baalen,
Christos Louizos,
Markus Nagel,
Rana Ali Amjad,
Ying Wang,
Tijmen Blankevoort,
Max Welling
Abstract:
We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide…
▽ More
We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by performing approximate inference over the gates, with prior distributions that encourage most of them to be switched off. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.
△ Less
Submitted 27 October, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
A Data and Compute Efficient Design for Limited-Resources Deep Learning
Authors:
Mirgahney Mohamed,
Gabriele Cesa,
Taco S. Cohen,
Max Welling
Abstract:
Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning soluti…
▽ More
Thanks to their improved data efficiency, equivariant neural networks have gained increased interest in the deep learning community. They have been successfully applied in the medical domain where symmetries in the data can be effectively exploited to build more accurate and robust models. To be able to reach a much larger body of patients, mobile, on-device implementations of deep learning solutions have been developed for medical applications. However, equivariant models are commonly implemented using large and computationally expensive architectures, not suitable to run on mobile devices. In this work, we design and test an equivariant version of MobileNetV2 and further optimize it with model quantization to enable more efficient inference. We achieve close-to state of the art performance on the Patch Camelyon (PCam) medical dataset while being more computationally efficient.
△ Less
Submitted 8 July, 2020; v1 submitted 20 April, 2020;
originally announced April 2020.
-
Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs
Authors:
Pim de Haan,
Maurice Weiler,
Taco Cohen,
Max Welling
Abstract:
A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs). Such GCNs utilize isotropic kernels and are therefore insensitive to the relative orientation of vertices and thus to the geometry of the mesh as a whole. We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels. Since…
▽ More
A common approach to define convolutions on meshes is to interpret them as a graph and apply graph convolutional networks (GCNs). Such GCNs utilize isotropic kernels and are therefore insensitive to the relative orientation of vertices and thus to the geometry of the mesh as a whole. We propose Gauge Equivariant Mesh CNNs which generalize GCNs to apply anisotropic gauge equivariant kernels. Since the resulting features carry orientation information, we introduce a geometric message passing scheme defined by parallel transporting features over mesh edges. Our experiments validate the significantly improved expressivity of the proposed model over conventional GCNs and other methods.
△ Less
Submitted 19 November, 2021; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Neural Enhanced Belief Propagation on Factor Graphs
Authors:
Victor Garcia Satorras,
Max Welling
Abstract:
A graphical model is a structured representation of locally dependent random variables. A traditional method to reason over these random variables is to perform inference using belief propagation. When provided with the true data generating process, belief propagation can infer the optimal posterior probability estimates in tree structured factor graphs. However, in many cases we may only have acc…
▽ More
A graphical model is a structured representation of locally dependent random variables. A traditional method to reason over these random variables is to perform inference using belief propagation. When provided with the true data generating process, belief propagation can infer the optimal posterior probability estimates in tree structured factor graphs. However, in many cases we may only have access to a poor approximation of the data generating process, or we may face loops in the factor graph, leading to suboptimal estimates. In this work we first extend graph neural networks to factor graphs (FG-GNN). We then propose a new hybrid model that runs conjointly a FG-GNN with belief propagation. The FG-GNN receives as input messages from belief propagation at every inference iteration and outputs a corrected version of them. As a result, we obtain a more accurate algorithm that combines the benefits of both belief propagation and graph neural networks. We apply our ideas to error correction decoding tasks, and we show that our algorithm can outperform belief propagation for LDPC codes on bursty channels.
△ Less
Submitted 16 March, 2021; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
Authors:
Elise van der Pol,
Thomas Kipf,
Frans A. Oliehoek,
Max Welling
Abstract:
This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that…
▽ More
This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process (MDP). Learning equivariant maps leads to structured latent spaces, allowing us to build a model on which we plan through value iteration. We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP. Moreover, the approach easily adapts to changes in the goal states. Empirically, we show that in such MDPs, we obtain better representations in fewer epochs compared to representation learning approaches using reconstructions, while generalizing better to new goals than model-free approaches.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Gradient $\ell_1$ Regularization for Quantization Robustness
Authors:
Milad Alizadeh,
Arash Behboodi,
Mart van Baalen,
Christos Louizos,
Tijmen Blankevoort,
Max Welling
Abstract:
We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application c…
▽ More
We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Estimating Gradients for Discrete Random Variables by Sampling without Replacement
Authors:
Wouter Kool,
Herke van Hoof,
Max Welling
Abstract:
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in con…
▽ More
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Learning to Predict Error for MRI Reconstruction
Authors:
Shi Hu,
Nicola Pezzotti,
Max Welling
Abstract:
In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe t…
▽ More
In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe that current methods unnecessarily compromise performance by modifying the model and training loss to estimate the target and uncertainty jointly. We show that estimating them separately without modifications improves performance. Following this, we propose a novel method that estimates the target labels and magnitude of the prediction error in two steps. We demonstrate this method on a large-scale MRI reconstruction task, and achieve significantly better results than the state-of-the-art uncertainty estimation methods.
△ Less
Submitted 6 July, 2021; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Authors:
Andrey Kuzmin,
Markus Nagel,
Saurabh Pitre,
Sandeep Pendyam,
Tijmen Blankevoort,
Max Welling
Abstract:
The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures. One effective way of making networks more efficient is neural network compression. We provide an overview of existing neural network compression methods that can be used to make neural networks more efficient by changing the architecture of the network. First,…
▽ More
The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures. One effective way of making networks more efficient is neural network compression. We provide an overview of existing neural network compression methods that can be used to make neural networks more efficient by changing the architecture of the network. First, we introduce a new way to categorize all published compression methods, based on the amount of data and compute needed to make the methods work in practice. These are three 'levels of compression solutions'. Second, we provide a taxonomy of tensor factorization based and probabilistic compression methods. Finally, we perform an extensive evaluation of different compression techniques from the literature for models trained on ImageNet. We show that SVD and probabilistic compression or pruning methods are complementary and give the best results of all the considered methods. We also provide practical ways to combine them.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Learning Likelihoods with Conditional Normalizing Flows
Authors:
Christina Winkler,
Daniel Worrall,
Emiel Hoogeboom,
Max Welling
Abstract:
Normalizing Flows (NFs) are able to model complicated distributions p(y) with strong inter-dimensional correlations and high multimodality by transforming a simple base density p(z) through an invertible neural network under the change of variables formula. Such behavior is desirable in multivariate structured prediction tasks, where handcrafted per-pixel loss-based methods inadequately capture st…
▽ More
Normalizing Flows (NFs) are able to model complicated distributions p(y) with strong inter-dimensional correlations and high multimodality by transforming a simple base density p(z) through an invertible neural network under the change of variables formula. Such behavior is desirable in multivariate structured prediction tasks, where handcrafted per-pixel loss-based methods inadequately capture strong correlations between output dimensions. We present a study of conditional normalizing flows (CNFs), a class of NFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x). CNFs are efficient in sampling and inference, they can be trained with a likelihood-based objective, and CNFs, being generative flows, do not suffer from mode collapse or training instabilities. We provide an effective method to train continuous CNFs for binary problems and in particular, we apply these CNFs to super-resolution and vessel segmentation tasks demonstrating competitive performance on standard benchmark datasets in terms of likelihood and conventional metrics.
△ Less
Submitted 12 November, 2023; v1 submitted 29 November, 2019;
originally announced December 2019.
-
Contrastive Learning of Structured World Models
Authors:
Thomas Kipf,
Elise van der Pol,
Max Welling
Abstract:
A structured understanding of our world in terms of objects, relations, and hierarchies is an important component of human cognition. Learning such a structured world model from raw sensory data remains a challenge. As a step towards this goal, we introduce Contrastively-trained Structured World Models (C-SWMs). C-SWMs utilize a contrastive approach for representation learning in environments with…
▽ More
A structured understanding of our world in terms of objects, relations, and hierarchies is an important component of human cognition. Learning such a structured world model from raw sensory data remains a challenge. As a step towards this goal, we introduce Contrastively-trained Structured World Models (C-SWMs). C-SWMs utilize a contrastive approach for representation learning in environments with compositional structure. We structure each state embedding as a set of object representations and their relations, modeled by a graph neural network. This allows objects to be discovered from raw pixel observations without direct supervision as part of the learning process. We evaluate C-SWMs on compositional environments involving multiple interacting objects that can be manipulated independently by an agent, simple Atari games, and a multi-object physics simulation. Our experiments demonstrate that C-SWMs can overcome limitations of models based on pixel reconstruction and outperform typical representatives of this model class in highly structured environments, while learning interpretable object-based representations.
△ Less
Submitted 5 January, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
Invert to Learn to Invert
Authors:
Patrick Putzky,
Max Welling
Abstract:
Iterative learning to infer approaches have become popular solvers for inverse problems. However, their memory requirements during training grow linearly with model depth, limiting in practice model expressiveness. In this work, we propose an iterative inverse model with constant memory that relies on invertible networks to avoid storing intermediate activations. As a result, the proposed approach…
▽ More
Iterative learning to infer approaches have become popular solvers for inverse problems. However, their memory requirements during training grow linearly with model depth, limiting in practice model expressiveness. In this work, we propose an iterative inverse model with constant memory that relies on invertible networks to avoid storing intermediate activations. As a result, the proposed approach allows us to train models with 400 layers on 3D volumes in an MRI image reconstruction task. In experiments on a public data set, we demonstrate that these deeper, and thus more expressive, networks perform state-of-the-art image reconstruction.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
DP-MAC: The Differentially Private Method of Auxiliary Coordinates for Deep Learning
Authors:
Frederik Harder,
Jonas Köhler,
Max Welling,
Mijung Park
Abstract:
Developing a differentially private deep learning algorithm is challenging, due to the difficulty in analyzing the sensitivity of objective functions that are typically used to train deep neural networks. Many existing methods resort to the stochastic gradient descent algorithm and apply a pre-defined sensitivity to the gradients for privatizing weights. However, their slow convergence typically y…
▽ More
Developing a differentially private deep learning algorithm is challenging, due to the difficulty in analyzing the sensitivity of objective functions that are typically used to train deep neural networks. Many existing methods resort to the stochastic gradient descent algorithm and apply a pre-defined sensitivity to the gradients for privatizing weights. However, their slow convergence typically yields a high cumulative privacy loss. Here, we take a different route by employing the method of auxiliary coordinates, which allows us to independently update the weights per layer by optimizing a per-layer objective function. This objective function can be well approximated by a low-order Taylor's expansion, in which sensitivity analysis becomes tractable. We perturb the coefficients of the expansion for privacy, which we optimize using more advanced optimization routines than SGD for faster convergence. We empirically show that our algorithm provides a decent trained model quality under a modest privacy budget.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Relational Generalized Few-Shot Learning
Authors:
Xiahan Shi,
Leonard Salewski,
Martin Schiegg,
Zeynep Akata,
Max Welling
Abstract:
Transferring learned models to novel tasks is a challenging problem, particularly if only very few labeled examples are available. Although this few-shot learning setup has received a lot of attention recently, most proposed methods focus on discriminating novel classes only. Instead, we consider the extended setup of generalized few-shot learning (GFSL), where the model is required to perform cla…
▽ More
Transferring learned models to novel tasks is a challenging problem, particularly if only very few labeled examples are available. Although this few-shot learning setup has received a lot of attention recently, most proposed methods focus on discriminating novel classes only. Instead, we consider the extended setup of generalized few-shot learning (GFSL), where the model is required to perform classification on the joint label space consisting of both previously seen and novel classes. We propose a graph-based framework that explicitly models relationships between all seen and novel classes in the joint label space. Our model Graph-convolutional Global Prototypical Networks (GcGPN) incorporates these inter-class relations using graph-convolution in order to embed novel class representations into the existing space of previously seen classes in a globally consistent manner. Our approach ensures both fast adaptation and global discrimination, which is the major challenge in GFSL. We demonstrate the benefits of our model on two challenging benchmark datasets.
△ Less
Submitted 15 September, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Batch-Shaping for Learning Conditional Channel Gated Networks
Authors:
Babak Ehteshami Bejnordi,
Tijmen Blankevoort,
Max Welling
Abstract:
We present a method that trains large capacity neural networks with significantly improved accuracy and lower dynamic computational cost. We achieve this by gating the deep-learning architecture on a fine-grained-level. Individual convolutional maps are turned on/off conditionally on features in the network. To achieve this, we introduce a new residual block architecture that gates convolutional c…
▽ More
We present a method that trains large capacity neural networks with significantly improved accuracy and lower dynamic computational cost. We achieve this by gating the deep-learning architecture on a fine-grained-level. Individual convolutional maps are turned on/off conditionally on features in the network. To achieve this, we introduce a new residual block architecture that gates convolutional channels in a fine-grained manner. We also introduce a generally applicable tool $batch$-$shaping$ that matches the marginal aggregate posteriors of features in a neural network to a pre-specified prior distribution. We use this novel technique to force gates to be more conditional on the data. We present results on CIFAR-10 and ImageNet datasets for image classification, and Cityscapes for semantic segmentation. Our results show that our method can slim down large architectures conditionally, such that the average computational cost on the data is on par with a smaller architecture, but with higher accuracy. In particular, on ImageNet, our ResNet50 and ResNet34 gated networks obtain 74.60% and 72.55% top-1 accuracy compared to the 69.76% accuracy of the baseline ResNet18 model, for similar complexity. We also show that the resulting networks automatically learn to use more features for difficult examples and fewer features for simple examples.
△ Less
Submitted 3 April, 2020; v1 submitted 15 July, 2019;
originally announced July 2019.
-
Supervised Uncertainty Quantification for Segmentation with Multiple Annotations
Authors:
Shi Hu,
Daniel Worrall,
Stefan Knegt,
Bas Veeling,
Henkjan Huisman,
Max Welling
Abstract:
The accurate estimation of predictive uncertainty carries importance in medical scenarios such as lung node segmentation. Unfortunately, most existing works on predictive uncertainty do not return calibrated uncertainty estimates, which could be used in practice. In this work we exploit multi-grader annotation variability as a source of 'groundtruth' aleatoric uncertainty, which can be treated as…
▽ More
The accurate estimation of predictive uncertainty carries importance in medical scenarios such as lung node segmentation. Unfortunately, most existing works on predictive uncertainty do not return calibrated uncertainty estimates, which could be used in practice. In this work we exploit multi-grader annotation variability as a source of 'groundtruth' aleatoric uncertainty, which can be treated as a target in a supervised learning problem. We combine this groundtruth uncertainty with a Probabilistic U-Net and test on the LIDC-IDRI lung nodule CT dataset and MICCAI2012 prostate MRI dataset. We find that we are able to improve predictive uncertainty estimates. We also find that we can improve sample accuracy and sample diversity. In real-world applications, our method could inform doctors about the confidence of the segmentation results.
△ Less
Submitted 27 May, 2022; v1 submitted 3 July, 2019;
originally announced July 2019.
-
The Functional Neural Process
Authors:
Christos Louizos,
Xiahan Shi,
Klamer Schutte,
Max Welling
Abstract:
We present a new family of exchangeable stochastic processes, the Functional Neural Processes (FNPs). FNPs model distributions over functions by learning a graph of dependencies on top of latent representations of the points in the given dataset. In doing so, they define a Bayesian model without explicitly positing a prior distribution over latent global parameters; they instead adopt priors over…
▽ More
We present a new family of exchangeable stochastic processes, the Functional Neural Processes (FNPs). FNPs model distributions over functions by learning a graph of dependencies on top of latent representations of the points in the given dataset. In doing so, they define a Bayesian model without explicitly positing a prior distribution over latent global parameters; they instead adopt priors over the relational structure of the given dataset, a task that is much simpler. We show how we can learn such models from data, demonstrate that they are scalable to large datasets through mini-batch optimization and describe how we can make predictions for new points via their posterior predictive distribution. We experimentally evaluate FNPs on the tasks of toy regression and image classification and show that, when compared to baselines that employ global latent parameters, they offer both competitive predictions as well as more robust uncertainty estimates.
△ Less
Submitted 4 November, 2019; v1 submitted 19 June, 2019;
originally announced June 2019.