-
Solving Bayesian inverse problems with diffusion priors and off-policy RL
Authors:
Luca Scimeca,
Siddarth Venkatraman,
Moksh Jain,
Minsu Kim,
Marcin Sendera,
Mohsin Hasan,
Luke Rowe,
Sarthak Mittal,
Pablo Lemos,
Emmanuel Bengio,
Alexandre Adam,
Jarrid Rector-Brooks,
Yashar Hezaveh,
Laurence Perreault-Levasseur,
Yoshua Bengio,
Glen Berseth,
Nikolay Malkin
Abstract:
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (RL) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems…
▽ More
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (RL) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
SEMU: Singular Value Decomposition for Efficient Machine Unlearning
Authors:
Marcin Sendera,
Łukasz Struski,
Kamil Książek,
Kryspin Musiol,
Jacek Tabor,
Dawid Rymarczyk
Abstract:
While the capabilities of generative foundational models have advanced rapidly in recent years, methods to prevent harmful and unsafe behaviors remain underdeveloped. Among the pressing challenges in AI safety, machine unlearning (MU) has become increasingly critical to meet upcoming safety regulations. Most existing MU approaches focus on altering the most significant parameters of the model. How…
▽ More
While the capabilities of generative foundational models have advanced rapidly in recent years, methods to prevent harmful and unsafe behaviors remain underdeveloped. Among the pressing challenges in AI safety, machine unlearning (MU) has become increasingly critical to meet upcoming safety regulations. Most existing MU approaches focus on altering the most significant parameters of the model. However, these methods often require fine-tuning substantial portions of the model, resulting in high computational costs and training instabilities, which are typically mitigated by access to the original training dataset.
In this work, we address these limitations by leveraging Singular Value Decomposition (SVD) to create a compact, low-dimensional projection that enables the selective forgetting of specific data points. We propose Singular Value Decomposition for Efficient Machine Unlearning (SEMU), a novel approach designed to optimize MU in two key aspects. First, SEMU minimizes the number of model parameters that need to be modified, effectively removing unwanted knowledge while making only minimal changes to the model's weights. Second, SEMU eliminates the dependency on the original training dataset, preserving the model's previously acquired knowledge without additional data requirements.
Extensive experiments demonstrate that SEMU achieves competitive performance while significantly improving efficiency in terms of both data usage and the number of modified parameters.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Authors:
Siddarth Venkatraman,
Mohsin Hasan,
Minsu Kim,
Luca Scimeca,
Marcin Sendera,
Yoshua Bengio,
Glen Berseth,
Nikolay Malkin
Abstract:
Any well-behaved generative model over a variable $\mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $\mathbf{z}$: $\mathbf{x}=f_θ(\mathbf{z})$. In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_θ(\mathbf{x})$ is straightforward, but sampling from a posterior…
▽ More
Any well-behaved generative model over a variable $\mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $\mathbf{z}$: $\mathbf{x}=f_θ(\mathbf{z})$. In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_θ(\mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(\mathbf{x}\mid\mathbf{y}) \propto p_θ(\mathbf{x})r(\mathbf{x},\mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $\mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($\mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_θ(\mathbf{z})$ are distributed according to the posterior in the data space ($\mathbf{x}$). For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably with other inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.
△ Less
Submitted 28 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training
Authors:
Julius Berner,
Lorenz Richter,
Marcin Sendera,
Jarrid Rector-Brooks,
Nikolay Malkin
Abstract:
We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of o…
▽ More
We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Revisiting the Equivalence of Bayesian Neural Networks and Gaussian Processes: On the Importance of Learning Activations
Authors:
Marcin Sendera,
Amin Sorkhei,
Tomasz Kuśmierczyk
Abstract:
Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are eithe…
▽ More
Gaussian Processes (GPs) provide a convenient framework for specifying function-space priors, making them a natural choice for modeling uncertainty. In contrast, Bayesian Neural Networks (BNNs) offer greater scalability and extendability but lack the advantageous properties of GPs. This motivates the development of BNNs capable of replicating GP-like behavior. However, existing solutions are either limited to specific GP kernels or rely on heuristics.
We demonstrate that trainable activations are crucial for effective mapping of GP priors to wide BNNs. Specifically, we leverage the closed-form 2-Wasserstein distance for efficient gradient-based optimization of reparameterized priors and activations. Beyond learned activations, we also introduce trainable periodic activations that ensure global stationarity by design, and functional priors conditioned on GP hyperparameters to allow efficient model selection.
Empirically, our method consistently outperforms existing approaches or matches performance of the heuristic methods, while offering stronger theoretical foundations.
△ Less
Submitted 11 June, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
AutoLoRA: AutoGuidance Meets Low-Rank Adaptation for Diffusion Models
Authors:
Artur Kasymov,
Marcin Sendera,
Michał Stypułkowski,
Maciej Zięba,
Przemysław Spurek
Abstract:
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models. LoRA utilizes a small number of context examples to adapt the model to a specific domain, character, style, or concept. However, due to the limited data utilized during training, the fine-tuned model performance is often characterized by strong context bias and a low degree of vari…
▽ More
Low-rank adaptation (LoRA) is a fine-tuning technique that can be applied to conditional generative diffusion models. LoRA utilizes a small number of context examples to adapt the model to a specific domain, character, style, or concept. However, due to the limited data utilized during training, the fine-tuned model performance is often characterized by strong context bias and a low degree of variability in the generated images. To solve this issue, we introduce AutoLoRA, a novel guidance technique for diffusion models fine-tuned with the LoRA approach. Inspired by other guidance techniques, AutoLoRA searches for a trade-off between consistency in the domain represented by LoRA weights and sample diversity from the base conditional diffusion model. Moreover, we show that incorporating classifier-free guidance for both LoRA fine-tuned and base models leads to generating samples with higher diversity and better quality. The experimental results for several fine-tuned LoRA domains show superiority over existing guidance techniques on selected metrics.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Amortizing intractable inference in diffusion models for vision, language, and control
Authors:
Siddarth Venkatraman,
Moksh Jain,
Luca Scimeca,
Minsu Kim,
Marcin Sendera,
Mohsin Hasan,
Luke Rowe,
Sarthak Mittal,
Pablo Lemos,
Emmanuel Bengio,
Alexandre Adam,
Jarrid Rector-Brooks,
Yoshua Bengio,
Glen Berseth,
Nikolay Malkin
Abstract:
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat…
▽ More
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning.
△ Less
Submitted 13 January, 2025; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Authors:
Tara Akhound-Sadegh,
Jarrid Rector-Brooks,
Avishek Joey Bose,
Sarthak Mittal,
Pablo Lemos,
Cheng-Hao Liu,
Marcin Sendera,
Siamak Ravanbakhsh,
Gauthier Gidel,
Yoshua Bengio,
Nikolay Malkin,
Alexander Tong
Abstract:
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and…
▽ More
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.
△ Less
Submitted 26 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Improved off-policy training of diffusion samplers
Authors:
Marcin Sendera,
Minsu Kim,
Sarthak Mittal,
Pablo Lemos,
Luca Scimeca,
Jarrid Rector-Brooks,
Alexandre Adam,
Yoshua Bengio,
Nikolay Malkin
Abstract:
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into…
▽ More
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.
△ Less
Submitted 13 January, 2025; v1 submitted 7 February, 2024;
originally announced February 2024.
-
HyperShot: Few-Shot Learning by Kernel HyperNetworks
Authors:
Marcin Sendera,
Marcin Przewięźlikowski,
Konrad Karanowski,
Maciej Zięba,
Jacek Tabor,
Przemysław Spurek
Abstract:
Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and hypernetwork paradigm. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our model aims to switch the cl…
▽ More
Few-shot models aim at making predictions using a minimal number of labeled examples from a given task. The main challenge in this area is the one-shot setting where only one element represents each class. We propose HyperShot - the fusion of kernels and hypernetwork paradigm. Compared to reference approaches that apply a gradient-based adjustment of the parameters, our model aims to switch the classification module parameters depending on the task's embedding. In practice, we utilize a hypernetwork, which takes the aggregated information from support data and returns the classifier's parameters handcrafted for the considered problem. Moreover, we introduce the kernel-based representation of the support examples delivered to hypernetwork to create the parameters of the classification module. Consequently, we rely on relations between embeddings of the support examples instead of direct feature values provided by the backbone models. Thanks to this approach, our model can adapt to highly different tasks.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Non-Gaussian Gaussian Processes for Few-Shot Regression
Authors:
Marcin Sendera,
Jacek Tabor,
Aleksandra Nowak,
Andrzej Bedychaj,
Massimiliano Patacchiola,
Tomasz Trzciński,
Przemysław Spurek,
Maciej Zięba
Abstract:
Gaussian Processes (GPs) have been widely used in machine learning to model distributions over functions, with applications including multi-modal regression, time-series prediction, and few-shot learning. GPs are particularly useful in the last application since they rely on Normal distributions and enable closed-form computation of the posterior probability function. Unfortunately, because the re…
▽ More
Gaussian Processes (GPs) have been widely used in machine learning to model distributions over functions, with applications including multi-modal regression, time-series prediction, and few-shot learning. GPs are particularly useful in the last application since they rely on Normal distributions and enable closed-form computation of the posterior probability function. Unfortunately, because the resulting posterior is not flexible enough to capture complex distributions, GPs assume high similarity between subsequent tasks - a requirement rarely met in real-world conditions. In this work, we address this limitation by leveraging the flexibility of Normalizing Flows to modulate the posterior predictive distribution of the GP. This makes the GP posterior locally non-Gaussian, therefore we name our method Non-Gaussian Gaussian Processes (NGGPs). More precisely, we propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them. We empirically tested the flexibility of NGGPs on various few-shot learning regression datasets, showing that the mapping can incorporate context embedding information to model different noise levels for periodic functions. As a result, our method shares the structure of the problem between subsequent tasks, but the contextualization allows for adaptation to dissimilarities. NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Flow-based SVDD for anomaly detection
Authors:
Marcin Sendera,
Marek Śmieja,
Łukasz Maziarka,
Łukasz Struski,
Przemysław Spurek,
Jacek Tabor
Abstract:
We propose FlowSVDD -- a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable res…
▽ More
We propose FlowSVDD -- a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable results to the current state-of-the-art methods and significantly outperforms related deep SVDD methods on benchmark datasets.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
OneFlow: One-class flow for anomaly detection based on a minimal volume region
Authors:
Łukasz Maziarka,
Marek Śmieja,
Marcin Sendera,
Łukasz Struski,
Jacek Tabor,
Przemysław Spurek
Abstract:
We propose OneFlow - a flow-based one-class classifier for anomaly (outlier) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located…
▽ More
We propose OneFlow - a flow-based one-class classifier for anomaly (outlier) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located near to the decision boundary (behavior similar to the support vectors in SVM). The combination of flow models and a Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds. Experiments show that the proposed model outperforms related methods on real-world anomaly detection problems.
△ Less
Submitted 22 September, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Data adaptation in HANDY economy-ideology model
Authors:
Marcin Sendera
Abstract:
The concept of mathematical modeling is widespread across almost all of the fields of contemporary science and engineering. Because of the existing necessity of predictions the behavior of natural phenomena, the researchers develop more and more complex models. However, despite their ability to better forecasting, the problem of an appropriate fitting ground truth data to those, high-dimensional a…
▽ More
The concept of mathematical modeling is widespread across almost all of the fields of contemporary science and engineering. Because of the existing necessity of predictions the behavior of natural phenomena, the researchers develop more and more complex models. However, despite their ability to better forecasting, the problem of an appropriate fitting ground truth data to those, high-dimensional and nonlinear models seems to be inevitable. In order to deal with this demanding problem the entire discipline of data assimilation has been developed. Basing on the Human and Nature Dynamics (HANDY) model, we have presented a detailed and comprehensive comparison of Approximate Bayesian Computation (classic data assimilation method) and a novelty approach of Supermodeling. Furthermore, with the usage of Sensitivity Analysis, we have proposed the methodology to reduce the number of coupling coefficients between submodels and as a consequence to increase the speed of the Supermodel converging. In addition, we have demonstrated that usage of Approximate Bayesian Computation method with the knowledge about parameters' sensitivities could result with satisfactory estimation of the initial parameters. However, we have also presented the mentioned methodology as unable to achieve similar predictions to Approximate Bayesian Computation. Finally, we have proved that Supermodeling with synchronization via the most sensitive variable could effect with the better forecasting for chaotic as well as more stable systems than the Approximate Bayesian Computation. What is more, we have proposed the adequate methodologies.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.