-
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
Authors:
Tyler Farghly,
Patrick Rebeschini,
George Deligiannidis,
Arnaud Doucet
Abstract:
The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on…
▽ More
The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data -- implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Mixing Time Bounds for the Gibbs Sampler under Isoperimetry
Authors:
Alexander Goyal,
George Deligiannidis,
Nikolas Kantas
Abstract:
We establish bounds on the conductance for the systematic-scan and random-scan Gibbs samplers when the target distribution satisfies a Poincare or log-Sobolev inequality and possesses sufficiently regular conditional distributions. These bounds lead to mixing time guarantees that extend beyond the log-concave setting, offering new insights into the convergence behavior of Gibbs sampling in broader…
▽ More
We establish bounds on the conductance for the systematic-scan and random-scan Gibbs samplers when the target distribution satisfies a Poincare or log-Sobolev inequality and possesses sufficiently regular conditional distributions. These bounds lead to mixing time guarantees that extend beyond the log-concave setting, offering new insights into the convergence behavior of Gibbs sampling in broader regimes. Moreover, we demonstrate that our results remain valid for log-Lipschitz and log-smooth target distributions. Our approach relies on novel three-set isoperimetric inequalities and a sequential coupling argument for the Gibbs sampler.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Conditioning Diffusions Using Malliavin Calculus
Authors:
Jakiw Pidstrigach,
Elizabeth Baker,
Carles Domingo-Enrich,
George Deligiannidis,
Nikolas Nüsken
Abstract:
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an inf…
▽ More
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and centred around a generalisation of the Tweedie score formula to nonlinear stochastic differential equations, that enables the development of methods robust to such singularities. This allows our approach to handle a broad range of applications, like diffusion bridges, or adding conditional controls to an already trained diffusion model. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques. As a byproduct, we also introduce a novel score matching objective. Our loss functions are formulated such that they could readily be extended to manifold-valued and infinite dimensional diffusions.
△ Less
Submitted 6 June, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
On importance sampling and independent Metropolis-Hastings with an unbounded weight function
Authors:
George Deligiannidis,
Pierre E. Jacob,
El Mahdi Khribch,
Guanyang Wang
Abstract:
Importance sampling and independent Metropolis-Hastings (IMH) are among the fundamental building blocks of Monte Carlo methods. Both require a proposal distribution that globally approximates the target distribution. The Radon-Nikodym derivative of the target distribution relative to the proposal is called the weight function. Under the assumption that the weight is unbounded but has finite moment…
▽ More
Importance sampling and independent Metropolis-Hastings (IMH) are among the fundamental building blocks of Monte Carlo methods. Both require a proposal distribution that globally approximates the target distribution. The Radon-Nikodym derivative of the target distribution relative to the proposal is called the weight function. Under the assumption that the weight is unbounded but has finite moments under the proposal distribution, we study the approximation error of importance sampling and of the particle independent Metropolis-Hastings algorithm (PIMH), which includes IMH as a special case. For the chains generated by such algorithms, we show that the common random numbers coupling is maximal. Using that coupling we derive bounds on the total variation distance of a PIMH chain to its target distribution. Our results allow a formal comparison of the finite-time biases of importance sampling and IMH, and we find the latter to be have a smaller bias. We further consider bias removal techniques using couplings, and provide conditions under which the resulting unbiased estimators have finite moments. These unbiased estimators provide an alternative to self-normalized importance sampling, implementable in the same settings. We compare their asymptotic efficiency as the number of particles goes to infinity, and consider their use in robust mean estimation techniques.
△ Less
Submitted 14 June, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Linear Convergence of Diffusion Models Under the Manifold Hypothesis
Authors:
Peter Potaptchik,
Iskander Azangulov,
George Deligiannidis
Abstract:
Score-matching generative models have proven successful at sampling from complex high-dimensional data distributions. In many applications, this distribution is believed to concentrate on a much lower $d$-dimensional manifold embedded into $D$-dimensional space; this is known as the manifold hypothesis. The current best-known convergence guarantees are either linear in $D$ or polynomial (superline…
▽ More
Score-matching generative models have proven successful at sampling from complex high-dimensional data distributions. In many applications, this distribution is believed to concentrate on a much lower $d$-dimensional manifold embedded into $D$-dimensional space; this is known as the manifold hypothesis. The current best-known convergence guarantees are either linear in $D$ or polynomial (superlinear) in $d$. The latter exploits a novel integration scheme for the backward SDE. We take the best of both worlds and show that the number of steps diffusion models require in order to converge in Kullback-Leibler~(KL) divergence is linear (up to logarithmic terms) in the intrinsic dimension $d$. Moreover, we show that this linear dependency is sharp.
△ Less
Submitted 23 April, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions
Authors:
Iskander Azangulov,
George Deligiannidis,
Judith Rousseau
Abstract:
Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions and are widely used for image, audio, and video generation as well as many more applications in science and beyond. The \textit{manifold hypothesis} states that high-dimensional data often lie on lower-dimensional manifolds within the ambien…
▽ More
Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions and are widely used for image, audio, and video generation as well as many more applications in science and beyond. The \textit{manifold hypothesis} states that high-dimensional data often lie on lower-dimensional manifolds within the ambient space, and is widely believed to hold in provided examples. While recent results have provided invaluable insight into how diffusion models adapt to the manifold hypothesis, they do not capture the great empirical success of these models, making this a very fruitful research direction.
In this work, we study DDPMs under the manifold hypothesis and prove that they achieve rates independent of the ambient dimension in terms of score learning. In terms of sampling complexity, we obtain rates independent of the ambient dimension w.r.t. the Kullback-Leibler divergence, and $O(\sqrt{D})$ w.r.t. the Wasserstein distance. We do this by developing a new framework connecting diffusion models to the well-studied theory of extrema of Gaussian Processes.
△ Less
Submitted 23 April, 2025; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Ranking In Generalized Linear Bandits
Authors:
Amitis Shidani,
George Deligiannidis,
Arnaud Doucet
Abstract:
We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the…
▽ More
We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.
△ Less
Submitted 1 January, 2024; v1 submitted 30 June, 2022;
originally announced July 2022.
-
Quantitative Uniform Stability of the Iterative Proportional Fitting Procedure
Authors:
George Deligiannidis,
Valentin De Bortoli,
Arnaud Doucet
Abstract:
We establish the uniform in time stability, w.r.t. the marginals, of the Iterative Proportional Fitting Procedure, also known as Sinkhorn algorithm, used to solve entropy-regularised Optimal Transport problems. Our result is quantitative and stated in terms of the 1-Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
We establish the uniform in time stability, w.r.t. the marginals, of the Iterative Proportional Fitting Procedure, also known as Sinkhorn algorithm, used to solve entropy-regularised Optimal Transport problems. Our result is quantitative and stated in terms of the 1-Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
△ Less
Submitted 22 October, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Random walk algorithm for the Dirichlet problem for parabolic integro-differential equation
Authors:
G. Deligiannidis,
S. Maurer,
M. V. Tretyakov
Abstract:
We consider stochastic differential equations driven by a general Lévy processes (SDEs) with infinite activity and the related, via the Feynman-Kac formula, Dirichlet problem for parabolic integro-differential equation (PIDE). We approximate the solution of PIDE using a numerical method for the SDEs. The method is based on three ingredients: (i) we approximate small jumps by a diffusion; (ii) we u…
▽ More
We consider stochastic differential equations driven by a general Lévy processes (SDEs) with infinite activity and the related, via the Feynman-Kac formula, Dirichlet problem for parabolic integro-differential equation (PIDE). We approximate the solution of PIDE using a numerical method for the SDEs. The method is based on three ingredients: (i) we approximate small jumps by a diffusion; (ii) we use restricted jump-adaptive time-stepping; and (iii) between the jumps we exploit a weak Euler approximation. We prove weak convergence of the considered algorithm and present an in-depth analysis of how its error and computational cost depend on the jump activity level. Results of some numerical experiments, including pricing of barrier basket currency options, are presented.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Boundary of the Range of a random walk and the Fölner property
Authors:
George Deligiannidis,
Sebastien Gouezel,
Zemer Kosloff
Abstract:
The range process $R_n$ of a random walk is the collection of sites visited by the random walk up to time $n$. In this work we deal with the question of whether the range process of a random walk or the range process of a cocycle over an ergodic transformation is almost surely a Fölner sequence and show the following results: %
(a) The size of the inner boundary $|\partial R_n|$ of the range of re…
▽ More
The range process $R_n$ of a random walk is the collection of sites visited by the random walk up to time $n$. In this work we deal with the question of whether the range process of a random walk or the range process of a cocycle over an ergodic transformation is almost surely a Fölner sequence and show the following results: %
(a) The size of the inner boundary $|\partial R_n|$ of the range of recurrent aperiodic random walks on $\mathbb{Z}^2$ with finite variance and aperiodic random walks in $\mathbb{Z}$ in the standard domain of attraction of the Cauchy distribution, divided by $\frac{n}{\log^2(n)}$, converges to a constant almost surely. %
(b) We establish a formula for the Fölner asymptotic of transient cocycles over an ergodic probability preserving transformation and use it to show that for transient random walk on groups which are not virtually cyclic, for almost every path, the range is not a Fölner sequence. %
(c) For aperiodic random walks in the domain of attraction of symmetric $α$- stable distributions with $1<α\leq 2$, we prove a sharp polynomial upper bound for the decay at infinity of $|\partial R_n|/|R_n|$. This last result shows that the range process of these random walks is almost surely a Fölner sequence.
△ Less
Submitted 18 December, 2018; v1 submitted 24 October, 2018;
originally announced October 2018.
-
Randomized Hamiltonian Monte Carlo as Scaling Limit of the Bouncy Particle Sampler and Dimension-Free Convergence Rates
Authors:
George Deligiannidis,
Daniel Paulin,
Alexandre Bouchard-Côté,
Arnaud Doucet
Abstract:
The Bouncy Particle Sampler is a Markov chain Monte Carlo method based on a nonreversible piecewise deterministic Markov process. In this scheme, a particle explores the state space of interest by evolving according to a linear dynamics which is altered by bouncing on the hyperplane tangent to the gradient of the negative log-target density at the arrival times of an inhomogeneous Poisson Process…
▽ More
The Bouncy Particle Sampler is a Markov chain Monte Carlo method based on a nonreversible piecewise deterministic Markov process. In this scheme, a particle explores the state space of interest by evolving according to a linear dynamics which is altered by bouncing on the hyperplane tangent to the gradient of the negative log-target density at the arrival times of an inhomogeneous Poisson Process (PP) and by randomly perturbing its velocity at the arrival times of an homogeneous PP. Under regularity conditions, we show here that the process corresponding to the first component of the particle and its corresponding velocity converges weakly towards a Randomized Hamiltonian Monte Carlo (RHMC) process as the dimension of the ambient space goes to infinity. RHMC is another piecewise deterministic non-reversible Markov process where a Hamiltonian dynamics is altered at the arrival times of a homogeneous PP by randomly perturbing the momentum component. We then establish dimension-free convergence rates for RHMC for strongly log-concave targets with bounded Hessians using coupling ideas and hypocoercivity techniques.
△ Less
Submitted 23 December, 2020; v1 submitted 13 August, 2018;
originally announced August 2018.
-
Relative Complexity of Random Walks in Random Scenery in the absence of a weak invariance principle for the local times
Authors:
George Deligiannidis,
Zemer Kosloff
Abstract:
We answer the question of Aaronson about the relative complexity of Random Walks in Random Sceneries driven by either aperiodic two dimensional random walks, two-dimensional Simple Random walk, or by aperiodic random walks in the domain of attraction of the Cauchy distribution. A key step is proving that the range of the random walk satisfies the Fölner property almost surely.
We answer the question of Aaronson about the relative complexity of Random Walks in Random Sceneries driven by either aperiodic two dimensional random walks, two-dimensional Simple Random walk, or by aperiodic random walks in the domain of attraction of the Cauchy distribution. A key step is proving that the range of the random walk satisfies the Fölner property almost surely.
△ Less
Submitted 1 June, 2015;
originally announced June 2015.
-
Optimal bounds for self-intersection local times
Authors:
George Deligiannidis,
Sergey Utev
Abstract:
For a random walk $S_n, n\geq 0$ in $\mathbb{Z}^d$, let $l(n,x)$ be its local time at the site $x\in \mathbb{Z}^d$. Define the $α$-fold self intersection local time $L_n(α) := \sum_{x} l(n,x)^α$, and let $L_n(α|ε, d)$ the corresponding quantity for $d$-dimensional simple random walk. Without imposing any moment conditions, we show that the variances of the local times $\mathop{var}(L_n(α))$ of any…
▽ More
For a random walk $S_n, n\geq 0$ in $\mathbb{Z}^d$, let $l(n,x)$ be its local time at the site $x\in \mathbb{Z}^d$. Define the $α$-fold self intersection local time $L_n(α) := \sum_{x} l(n,x)^α$, and let $L_n(α|ε, d)$ the corresponding quantity for $d$-dimensional simple random walk. Without imposing any moment conditions, we show that the variances of the local times $\mathop{var}(L_n(α))$ of any genuinely $d$-dimensional random walk are bounded above by the corresponding characteristics of the simple symmetric random walk in $\mathbb{Z}^d$, i.e. $\mathop{var}(L_n(α)) \leq C \mathop{var}[L_n(α|ε, d)]\sim K_{d,α}v_{d,α}(n)$. In particular, variances of local times of all genuinely $d$-dimensional random walks, $d\geq 4$, are similar to the $4$-dimensional symmetric case $\mathop{var}(L_n(α)) = O(n)$. On the other hand, in dimensions $d\leq 3$ the resemblance to the simple random walk $\liminf_{n\to \infty} \mathop{var}(L_n(α))/v_{d,α}(n)>0$ implies that the jumps must have zero mean and finite second moment.
△ Less
Submitted 3 June, 2015; v1 submitted 29 May, 2015;
originally announced May 2015.
-
Asymptotic variance of stationary reversible and normal Markov processes
Authors:
George Deligiannidis,
Magda Peligrad,
Sergey Utev
Abstract:
We obtain necessary and sufficient conditions for the regular variation of the variance of partial sums of functionals of discrete and continuous-time stationary Markov processes with normal transition operators. We also construct a class of Metropolis-Hastings algorithms which satisfy a central limit theorem and invariance principle when the variance is not linear in $n$.
We obtain necessary and sufficient conditions for the regular variation of the variance of partial sums of functionals of discrete and continuous-time stationary Markov processes with normal transition operators. We also construct a class of Metropolis-Hastings algorithms which satisfy a central limit theorem and invariance principle when the variance is not linear in $n$.
△ Less
Submitted 10 May, 2014;
originally announced May 2014.
-
Variance of partial sums of stationary sequences
Authors:
George Deligiannidis,
Sergey Utev
Abstract:
Let $X_1,X_2,\ldots$ be a centred sequence of weakly stationary random variables with spectral measure $F$ and partial sums $S_n=X_1+\cdots+X_n$. We show that $\operatorname {var}(S_n)$ is regularly varying of index $γ$ at infinity, if and only if $G(x):=\int_{-x}^xF(\mathrm {d}x)$ is regularly varying of index $2-γ$ at the origin ($0<γ<2$).
Let $X_1,X_2,\ldots$ be a centred sequence of weakly stationary random variables with spectral measure $F$ and partial sums $S_n=X_1+\cdots+X_n$. We show that $\operatorname {var}(S_n)$ is regularly varying of index $γ$ at infinity, if and only if $G(x):=\int_{-x}^xF(\mathrm {d}x)$ is regularly varying of index $2-γ$ at the origin ($0<γ<2$).
△ Less
Submitted 21 October, 2013; v1 submitted 18 May, 2012;
originally announced May 2012.
-
An asymptotic variance of the self-intersections of random walks
Authors:
George Deligiannidis,
Sergey Utev
Abstract:
We present a Darboux-Wiener type lemma and apply it to obtain an exact asymptotic for the variance of the self-intersection of one and two-dimensional random walks. As a corollary, we obtain a central limit theorem for random walk in random scenery conjectured by Kesten and Spitzer in 1979.
We present a Darboux-Wiener type lemma and apply it to obtain an exact asymptotic for the variance of the self-intersection of one and two-dimensional random walks. As a corollary, we obtain a central limit theorem for random walk in random scenery conjectured by Kesten and Spitzer in 1979.
△ Less
Submitted 27 April, 2010;
originally announced April 2010.