-
Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems
Authors:
Teresa Klatzer,
Savvas Melidonis,
Marcelo Pereyra,
Konstantinos C. Zygalakis
Abstract:
This paper introduces a novel plug-and-play (PnP) Langevin sampling methodology for Bayesian inference in low-photon Poisson imaging problems, a challenging class of problems with significant applications in astronomy, medicine, and biology. PnP Langevin sampling algorithms offer a powerful framework for Bayesian image restoration, enabling accurate point estimation as well as advanced inference t…
▽ More
This paper introduces a novel plug-and-play (PnP) Langevin sampling methodology for Bayesian inference in low-photon Poisson imaging problems, a challenging class of problems with significant applications in astronomy, medicine, and biology. PnP Langevin sampling algorithms offer a powerful framework for Bayesian image restoration, enabling accurate point estimation as well as advanced inference tasks, including uncertainty quantification and visualization analyses, and empirical Bayesian inference for automatic model parameter tuning. However, existing PnP Langevin algorithms are not well-suited for low-photon Poisson imaging due to high solution uncertainty and poor regularity properties, such as exploding gradients and non-negativity constraints. To address these challenges, we propose two strategies for extending Langevin PnP sampling to Poisson imaging models: (i) an accelerated PnP Langevin method that incorporates boundary reflections and a Poisson likelihood approximation and (ii) a mirror sampling algorithm that leverages a Riemannian geometry to handle the constraints and the poor regularity of the likelihood without approximations. The effectiveness of these approaches is demonstrated through extensive numerical experiments and comparisons with state-of-the-art methods.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
A Unified Model for High-Resolution ODEs: New Insights on Accelerated Methods
Authors:
Hoomaan Maskan,
Konstantinos C. Zygalakis,
Armin Eftekhari,
Alp Yurtsever
Abstract:
Recent work on high-resolution ordinary differential equations (HR-ODEs) captures fine nuances among different momentum-based optimization methods, leading to accurate theoretical insights. However, these HR-ODEs often appear disconnected, each targeting a specific algorithm and derived with different assumptions and techniques. We present a unifying framework by showing that these diverse HR-ODEs…
▽ More
Recent work on high-resolution ordinary differential equations (HR-ODEs) captures fine nuances among different momentum-based optimization methods, leading to accurate theoretical insights. However, these HR-ODEs often appear disconnected, each targeting a specific algorithm and derived with different assumptions and techniques. We present a unifying framework by showing that these diverse HR-ODEs emerge as special cases of a general HR-ODE derived using the Forced Euler-Lagrange equation. Discretizing this model recovers a wide range of optimization algorithms through different parameter choices. Using integral quadratic constraints, we also introduce a general Lyapunov function to analyze the convergence of the proposed HR-ODE and its discretizations, achieving significant improvements across various cases, including new guarantees for the triple momentum method$'$s HR-ODE and the quasi-hyperbolic momentum method, as well as faster gradient norm minimization rates for Nesterov$'$s accelerated gradient algorithm, among other advances.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Accelerated optimization algorithms and ordinary differential equations: the convex non Euclidean case
Authors:
Paul Dobson,
Jesus María Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
We study the connections between ordinary differential equations and optimization algorithms in a non-Euclidean setting. We propose a novel accelerated algorithm for minimising convex functions over a convex constrained set. This algorithm is a natural generalization of Nesterov's accelerated gradient descent method to the non-Euclidean setting and can be interpreted as an additive Runge-Kutta alg…
▽ More
We study the connections between ordinary differential equations and optimization algorithms in a non-Euclidean setting. We propose a novel accelerated algorithm for minimising convex functions over a convex constrained set. This algorithm is a natural generalization of Nesterov's accelerated gradient descent method to the non-Euclidean setting and can be interpreted as an additive Runge-Kutta algorithm. The algorithm can also be derived as a numerical discretization of the ODE appearing in Krichene et al. (2015a). We use Lyapunov functions to establish convergence rates for the ODE and show that the discretizations considered achieve acceleration beyond the setting studied in Krichene et al. (2015a). Finally, we discuss how the proposed algorithm connects to various equations and algorithms in the literature.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Statistical modelling and Bayesian inversion for a Compton imaging system: application to radioactive source localisation
Authors:
Cecilia Tarpau,
Ming Fang,
Konstantinos C. Zygalakis,
Marcelo Pereyra,
Angela Di Fulvio,
Yoann Altmann
Abstract:
This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statis…
▽ More
This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statistical forward model is then used to solve the inverse problem of estimating the location and energy of point-like sources from observed data. This inverse problem is formulated and solved in a Bayesian framework by using a Metropolis within Gibbs algorithm for the estimation of the location, and an expectation-maximization algorithm for the estimation of the energy. This approach leads to more accurate estimation when compared with the deterministic standard back-projection approach, with the additional benefit of uncertainty quantification in the low photon imaging setting.
△ Less
Submitted 16 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
A hybrid tau-leap for simulating chemical kinetics with applications to parameter estimation
Authors:
Thomas Trigo Trindade,
Konstantinos C. Zygalakis
Abstract:
We consider the problem of efficiently simulating stochastic models of chemical kinetics. The Gillespie Stochastic Simulation algorithm (SSA) is often used to simulate these models, however, in many scenarios of interest, the computational cost quickly becomes prohibitive. This is further exasperated in the Bayesian inference context when estimating parameters of chemical models, as the intractabi…
▽ More
We consider the problem of efficiently simulating stochastic models of chemical kinetics. The Gillespie Stochastic Simulation algorithm (SSA) is often used to simulate these models, however, in many scenarios of interest, the computational cost quickly becomes prohibitive. This is further exasperated in the Bayesian inference context when estimating parameters of chemical models, as the intractability of the likelihood requires multiple simulations of the underlying system. To deal with issues of computational complexity in this paper, we propose a novel hybrid $τ$-leap algorithm for simulating well-mixed chemical systems. In particular, the algorithm uses $τ$-leap when appropriate (high population densities), and SSA when necessary (low population densities, when discrete effects become non-negligible). In the intermediate regime, a combination of the two methods, which leverages the properties of the underlying Poisson formulation, is employed. As illustrated through a number of numerical experiments the hybrid $τ$ offers significant computational savings when compared to SSA without however sacrificing the overall accuracy. This feature is particularly welcomed in the Bayesian inference context, as it allows for parameter estimation of stochastic chemical kinetics at reduced computational cost.
△ Less
Submitted 9 July, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
A Variational Perspective on High-Resolution ODEs
Authors:
Hoomaan Maskan,
Konstantinos C. Zygalakis,
Alp Yurtsever
Abstract:
We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-match…
▽ More
We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-matching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients. Several numerical experiments compare and illustrate our stochastic algorithm with state of the art methods.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling
Authors:
Teresa Klatzer,
Paul Dobson,
Yoann Altmann,
Marcelo Pereyra,
Jesús María Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is eq…
▽ More
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $κ$-strongly log-concave (i.e., requiring in the order of $\sqrtκ$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13,2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $κ$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors. Source codes for the numerical experiments of this paper are available from https://github.com/MI2G/accelerated-langevin-imla.
△ Less
Submitted 12 January, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Gaussian processes for Bayesian inverse problems associated with linear partial differential equations
Authors:
Tianming Bai,
Aretha L. Teckentrup,
Konstantinos C. Zygalakis
Abstract:
This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inver…
▽ More
This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
The split Gibbs sampler revisited: improvements to its algorithmic structure and augmented target distribution
Authors:
Marcelo Pereyra,
Luis A. Vargas-Mieles,
Konstantinos C. Zygalakis
Abstract:
Developing efficient Bayesian computation algorithms for imaging inverse problems is challenging due to the dimensionality involved and because Bayesian imaging models are often not smooth. Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation that is amenable to efficient exploration by using Langevin Markov chain Monte Ca…
▽ More
Developing efficient Bayesian computation algorithms for imaging inverse problems is challenging due to the dimensionality involved and because Bayesian imaging models are often not smooth. Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation that is amenable to efficient exploration by using Langevin Markov chain Monte Carlo (MCMC) methods. An alternative approach is based on data augmentation and relaxation, where auxiliary variables are introduced in order to construct an approximate augmented posterior distribution that is amenable to efficient exploration by Gibbs sampling. This paper proposes a new accelerated proximal MCMC method called latent space SK-ROCK (ls SK-ROCK), which tightly combines the benefits of the two aforementioned strategies. Additionally, instead of viewing the augmented posterior distribution as an approximation of the original model, we propose to consider it as a generalisation of this model. Following on from this, we empirically show that there is a range of values for the relaxation parameter for which the accuracy of the model improves, and propose a stochastic optimisation algorithm to automatically identify the optimal amount of relaxation for a given problem. In this regime, ls SK-ROCK converges faster than competing approaches from the state of the art, and also achieves better accuracy since the underlying augmented Bayesian model has a higher Bayesian evidence. The proposed methodology is demonstrated with a range of numerical experiments related to image deblurring and inpainting, as well as with comparisons with alternative approaches from the state of the art. An open-source implementation of the proposed MCMC methods is available from https://github.com/luisvargasmieles/ls-MCMC.
△ Less
Submitted 3 May, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Efficient Bayesian computation for low-photon imaging problems
Authors:
Savvas Melidonis,
Paul Dobson,
Yoann Altmann,
Marcelo Pereyra,
Konstantinos C. Zygalakis
Abstract:
This paper studies a new and highly efficient Markov chain Monte Carlo (MCMC) methodology to perform Bayesian inference in low-photon imaging problems, with particular attention to situations involving observation noise processes that deviate significantly from Gaussian noise, such as binomial, geometric and low-intensity Poisson noise. These problems are challenging for many reasons. From an infe…
▽ More
This paper studies a new and highly efficient Markov chain Monte Carlo (MCMC) methodology to perform Bayesian inference in low-photon imaging problems, with particular attention to situations involving observation noise processes that deviate significantly from Gaussian noise, such as binomial, geometric and low-intensity Poisson noise. These problems are challenging for many reasons. From an inferential viewpoint, low-photon numbers lead to severe identifiability issues, poor stability and high uncertainty about the solution. Moreover, low-photon models often exhibit poor regularity properties that make efficient Bayesian computation difficult; e.g., hard non-negativity constraints, non-smooth priors, and log-likelihood terms with exploding gradients. More precisely, the lack of suitable regularity properties hinders the use of state-of-the-art Monte Carlo methods based on numerical approximations of the Langevin stochastic differential equation (SDE), as both the SDE and its numerical approximations behave poorly. We address this difficulty by proposing an MCMC methodology based on a reflected and regularised Langevin SDE, which is shown to be well-posed and exponentially ergodic under mild and easily verifiable conditions. This then allows us to derive four reflected proximal Langevin MCMC algorithms to perform Bayesian computation in low-photon imaging problems. The proposed approach is demonstrated with a range of experiments related to image deblurring, denoising, and inpainting under binomial, geometric and Poisson noise.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
A Hierarchy of Network Models Giving Bistability Under Triadic Closure
Authors:
Stefano Di Giovacchino,
Desmond J. Higham,
Konstantinos C. Zygalakis
Abstract:
Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states.…
▽ More
Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states. In this work, we propose and study a hierarchy of network evolution models that incorporate triadic closure, building on the work of Grindrod, Higham, and Parsons [Internet Mathematics, 8, 2012, 402--423]. We use a chemical kinetics framework, paying careful attention to the reaction rate scaling with respect to the system size. In a macroscale regime, we show rigorously that a bimodal steady-state distribution is admitted. This behavior corresponds to the existence of two distinct stable fixed points in a deterministic mean-field ODE. The macroscale model is also seen to capture an apparent metastability property of the microscale system.
Computational simulations are used to support the analysis.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped…
▽ More
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $κ$, the algorithm is shown to produce with an $\mathcal{O}\big(κ^{5/4} d^{1/4}ε^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $ε>0$ away from the target distribution.
△ Less
Submitted 24 September, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Bayesian Imaging With Data-Driven Priors Encoded by Neural Networks: Theory, Methods, and Algorithms
Authors:
Matthew Holden,
Marcelo Pereyra,
Konstantinos C. Zygalakis
Abstract:
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variati…
▽ More
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variational autoencoder or a generative adversarial network. We establish the existence and well-posedness of the associated posterior distribution and posterior moments under easily verifiable conditions, providing a rigorous underpinning for Bayesian estimators and uncertainty quantification analyses. Bayesian computation is performed by using a parallel tempered version of the preconditioned Crank-Nicolson algorithm on the manifold, which is shown to be ergodic and robust to the non-convex nature of these data-driven models. In addition to point estimators and uncertainty quantification analyses, we derive a model misspecification test to automatically detect situations where the data-driven prior is unreliable, and explain how to identify the dimension of the latent space directly from the training data. The proposed approach is illustrated with a range of experiments with the MNIST dataset, where it outperforms alternative image reconstruction approaches from the state of the art. A model accuracy analysis suggests that the Bayesian probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition of probability.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
The blending region hybrid framework for the simulation of stochastic reaction-diffusion processes
Authors:
Christian A. Yates,
Adam George,
Armand Jordana,
Cameron A. Smith,
Andrew B. Duncan,
Konstantinos C. Zygalakis
Abstract:
The simulation of stochastic reaction-diffusion systems using fine-grained representations can become computationally prohibitive when particle numbers become large. If particle numbers are sufficiently high then it may be possible to ignore stochastic fluctuations and use a more efficient coarse-grained simulation approach. Nevertheless, for multiscale systems which exhibit significant spatial va…
▽ More
The simulation of stochastic reaction-diffusion systems using fine-grained representations can become computationally prohibitive when particle numbers become large. If particle numbers are sufficiently high then it may be possible to ignore stochastic fluctuations and use a more efficient coarse-grained simulation approach. Nevertheless, for multiscale systems which exhibit significant spatial variation in concentration, a coarse-grained approach may not be appropriate throughout the simulation domain. Such scenarios suggest a hybrid paradigm in which a computationally cheap, coarse-grained model is coupled to a more expensive, but more detailed fine-grained model enabling the accurate simulation of the fine-scale dynamics at a reasonable computational cost.
In this paper, in order to couple two representations of reaction-diffusion at distinct spatial scales, we allow them to overlap in a "blending region". Both modelling paradigms provide a valid representation of the particle density in this region. From one end of the blending region to the other, control of the implementation of diffusion is passed from one modelling paradigm to another through the use of complementary "blending functions" which scale up or down the contribution of each model to the overall diffusion. We establish the reliability of our novel hybrid paradigm by demonstrating its simulation on four exemplar reaction-diffusion scenarios.
△ Less
Submitted 30 September, 2020;
originally announced October 2020.
-
A Linear Transportation $\mathrm{L}^p$ Distance for Pattern Recognition
Authors:
Oliver M. Crook,
Mihai Cucuringu,
Tim Hurst,
Carola-Bibiane Schönlieb,
Matthew Thorpe,
Konstantinos C. Zygalakis
Abstract:
The transportation $\mathrm{L}^p$ distance, denoted $\mathrm{TL}^p$, has been proposed as a generalisation of Wasserstein $\mathrm{W}^p$ distances motivated by the property that it can be applied directly to colour or multi-channelled images, as well as multivariate time-series without normalisation or mass constraints. These distances, as with $\mathrm{W}^p$, are powerful tools in modelling data…
▽ More
The transportation $\mathrm{L}^p$ distance, denoted $\mathrm{TL}^p$, has been proposed as a generalisation of Wasserstein $\mathrm{W}^p$ distances motivated by the property that it can be applied directly to colour or multi-channelled images, as well as multivariate time-series without normalisation or mass constraints. These distances, as with $\mathrm{W}^p$, are powerful tools in modelling data with spatial or temporal perturbations. However, their computational cost can make them infeasible to apply to even moderate pattern recognition tasks. We propose linear versions of these distances and show that the linear $\mathrm{TL}^p$ distance significantly improves over the linear $\mathrm{W}^p$ distance on signal processing tasks, whilst being several orders of magnitude faster to compute than the $\mathrm{TL}^p$ distance.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
The connections between Lyapunov functions for some optimization algorithms and differential equations
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
In this manuscript, we study the properties of a family of second-order differential equations with damping, its discretizations and their connections with accelerated optimization algorithms for $m$-strongly convex and $L$-smooth functions. In particular, using the Linear Matrix Inequality LMI framework developed by \emph{Fazlyab et. al. $(2018)$}, we derive analytically a (discrete) Lyapunov fun…
▽ More
In this manuscript, we study the properties of a family of second-order differential equations with damping, its discretizations and their connections with accelerated optimization algorithms for $m$-strongly convex and $L$-smooth functions. In particular, using the Linear Matrix Inequality LMI framework developed by \emph{Fazlyab et. al. $(2018)$}, we derive analytically a (discrete) Lyapunov function for a two-parameter family of Nesterov optimization methods, which allows for the complete characterization of their convergence rate. In the appropriate limit, this family of methods may be seen as a discretization of a family of second-order ordinary differential equations for which we construct(continuous) Lyapunov functions by means of the LMI framework. The continuous Lyapunov functions may alternatively, be obtained by studying the limiting behaviour of their discrete counterparts. Finally, we show that the majority of typical discretizations of the family of ODEs, such as the Heavy ball method, do not possess Lyapunov functions with properties similar to those of the Lyapunov function constructed here for the Nesterov method.
△ Less
Submitted 11 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics
Authors:
Konstantin Rusch,
John W. Pearson,
Konstantinos C. Zygalakis
Abstract:
Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Ha…
▽ More
Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Hamiltonian recurrent neural network (Hamiltonian RNN), based on a symplectic discretization of an appropriately chosen Hamiltonian system. The key benefit of this approach is that the corresponding RNN inherits the favorable long time properties of the Hamiltonian system, which in turn allows us to control the hidden states gradient with a hyperparameter of the Hamiltonian RNN architecture. This enables us to handle sequential learning problems with arbitrary sequence lengths, since for a range of values of this hyperparameter the gradient neither vanishes nor explodes. Additionally, we provide a heuristic for the optimal choice of the hyperparameter, which we use in our numerical simulations to illustrate that the Hamiltonian RNN is able to outperform other state-of-the-art RNNs without the need of computationally intensive hyperparameter optimization.
△ Less
Submitted 16 March, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds
Authors:
Oliver M. Crook,
Tim Hurst,
Carola-Bibiane Schönlieb,
Matthew Thorpe,
Konstantinos C. Zygalakis
Abstract:
Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with t…
▽ More
Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with the same constraints. We take advantage of this connection by designing numerical schemes that first estimate the density of the data and then apply PDE methods, such as pseudo-spectral methods, to solve the corresponding Euler-Lagrange equation. We prove that our scheme is consistent in the large data limit for two methods of density estimation: kernel density estimation and spline kernel density estimation.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Contractivity of Runge-Kutta methods for convex gradient systems
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
We consider the application of Runge-Kutta (RK) methods to gradient systems $(d/dt)x = -\nabla V(x)$, where, as in many optimization problems, $V$ is convex and $\nabla V$ (globally) Lipschitz-continuous with Lipschitz constant $L$. Solutions of this system behave contractively, i.e. the Euclidean distance between two solutions $x(t)$ and $\widetilde{x}(t)$ is a nonincreasing function of $t$. It i…
▽ More
We consider the application of Runge-Kutta (RK) methods to gradient systems $(d/dt)x = -\nabla V(x)$, where, as in many optimization problems, $V$ is convex and $\nabla V$ (globally) Lipschitz-continuous with Lipschitz constant $L$. Solutions of this system behave contractively, i.e. the Euclidean distance between two solutions $x(t)$ and $\widetilde{x}(t)$ is a nonincreasing function of $t$. It is then of interest to investigate whether a similar contraction takes place, at least for suitably small step sizes $h$, for the discrete solution. Dahlquist and Jeltsch results' imply that (1) there are explicit RK schemes that behave contractively whenever $Lh$ is below a scheme-dependent constant and (2) Euler's rule is optimal in this regard. We prove however, by explicit construction of a convex potential using ideas from robust control theory, that there exists RK schemes that fail to behave contractively for any choice of the time-step $h$.
△ Less
Submitted 31 March, 2021; v1 submitted 22 September, 2019;
originally announced September 2019.
-
Accelerating proximal Markov chain Monte Carlo by using an explicit stabilised method
Authors:
Luis Vargas,
Marcelo Pereyra,
Konstantinos C. Zygalakis
Abstract:
We present a highly efficient proximal Markov chain Monte Carlo methodology to perform Bayesian computation in imaging problems. Similarly to previous proximal Monte Carlo approaches, the proposed method is derived from an approximation of the Langevin diffusion. However, instead of the conventional Euler-Maruyama approximation that underpins existing proximal Monte Carlo methods, here we use a st…
▽ More
We present a highly efficient proximal Markov chain Monte Carlo methodology to perform Bayesian computation in imaging problems. Similarly to previous proximal Monte Carlo approaches, the proposed method is derived from an approximation of the Langevin diffusion. However, instead of the conventional Euler-Maruyama approximation that underpins existing proximal Monte Carlo methods, here we use a state-of-the-art orthogonal Runge-Kutta-Chebyshev stochastic approximation that combines several gradient evaluations to significantly accelerate its convergence speed, similarly to accelerated gradient optimisation methods. The proposed methodology is demonstrated via a range of numerical experiments, including non-blind image deconvolution, hyperspectral unmixing, and tomographic reconstruction, with total-variation and $\ell_1$-type priors. Comparisons with Euler-type proximal Monte Carlo methods confirm that the Markov chains generated with our method exhibit significantly faster convergence speeds, achieve larger effective sample sizes, and produce lower mean square estimation errors at equal computational budget.
△ Less
Submitted 19 March, 2020; v1 submitted 23 August, 2019;
originally announced August 2019.
-
Explicit Stabilised Gradient Descent for Faster Strongly Convex Optimisation
Authors:
Armin Eftekhari,
Bart Vandereycken,
Gilles Vilmart,
Konstantinos C. Zygalakis
Abstract:
This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper p…
▽ More
This paper introduces the Runge-Kutta Chebyshev descent method (RKCD) for strongly convex optimisation problems. This new algorithm is based on explicit stabilised integrators for stiff differential equations, a powerful class of numerical schemes that avoid the severe step size restriction faced by standard explicit integrators. For optimising quadratic and strongly convex functions, this paper proves that RKCD nearly achieves the optimal convergence rate of the conjugate gradient algorithm, and the suboptimality of RKCD diminishes as the condition number of the quadratic function worsens. It is established that this optimal rate is obtained also for a partitioned variant of RKCD applied to perturbations of quadratic functions. In addition, numerical experiments on general strongly convex problems show that RKCD outperforms Nesterov's accelerated gradient descent.
△ Less
Submitted 27 June, 2020; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Noise Control for DNA Computing
Authors:
Tomislav Plesa,
Konstantinos C. Zygalakis,
David F. Anderson,
Radek Erban
Abstract:
Synthetic biology is a growing interdisciplinary field, with far-reaching applications, which aims to design biochemical systems that behave in a desired manner. With the advancement of strand-displacement DNA computing, a large class of abstract biochemical networks may be physically realized using DNA molecules. Methods for systematic design of the abstract systems with prescribed behaviors have…
▽ More
Synthetic biology is a growing interdisciplinary field, with far-reaching applications, which aims to design biochemical systems that behave in a desired manner. With the advancement of strand-displacement DNA computing, a large class of abstract biochemical networks may be physically realized using DNA molecules. Methods for systematic design of the abstract systems with prescribed behaviors have been predominantly developed at the (less-detailed) deterministic level. However, stochastic effects, neglected at the deterministic level, are increasingly found to play an important role in biochemistry. In such circumstances, methods for controlling the intrinsic noise in the system are necessary for a successful network design at the (more-detailed) stochastic level. To bridge the gap, the noise-control algorithm for designing biochemical networks is developed in this paper. The algorithm structurally modifies any given reaction network under mass-action kinetics, in such a way that (i) controllable state-dependent noise is introduced into the stochastic dynamics, while (ii) the deterministic dynamics are preserved. The capabilities of the algorithm are demonstrated on a production-decay reaction system, and on an exotic system displaying bistability. For the production-decay system, it is shown that the algorithm may be used to redesign the network to achieve noise-induced multistability. For the exotic system, the algorithm is used to redesign the network to control the stochastic switching, and achieve noise-induced oscillations.
△ Less
Submitted 20 June, 2017; v1 submitted 25 May, 2017;
originally announced May 2017.
-
Uncertainty quantification in graph-based classification of high dimensional data
Authors:
Andrea L. Bertozzi,
Xiyang Luo,
Andrew M. Stuart,
Konstantinos C. Zygalakis
Abstract:
Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distr…
▽ More
Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning.
We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003].
We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.
△ Less
Submitted 8 February, 2018; v1 submitted 26 March, 2017;
originally announced March 2017.
-
Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation
Authors:
A. B. Duncan,
G. A. Pavliotis,
K. C. Zygalakis
Abstract:
For a given target density, there exist an infinite number of diffusion processes which are ergodic with respect to this density. As observed in a number of papers, samplers based on nonreversible diffusion processes can significantly outperform their reversible counterparts both in terms of asymptotic variance and rate of convergence to equilibrium. In this paper, we take advantage of this in ord…
▽ More
For a given target density, there exist an infinite number of diffusion processes which are ergodic with respect to this density. As observed in a number of papers, samplers based on nonreversible diffusion processes can significantly outperform their reversible counterparts both in terms of asymptotic variance and rate of convergence to equilibrium. In this paper, we take advantage of this in order to construct efficient sampling algorithms based on the Lie-Trotter decomposition of a nonreversible diffusion process into reversible and nonreversible components. We show that samplers based on this scheme can significantly outperform standard MCMC methods, at the cost of introducing some controlled bias. In particular, we prove that numerical integrators constructed according to this decomposition are geometrically ergodic and characterise fully their asymptotic bias and variance, showing that the sampler inherits the good mixing properties of the underlying nonreversible diffusion. This is illustrated further with a number of numerical examples ranging from highly correlated low dimensional distributions, to logistic regression problems in high dimensions as well as inference for spatial models with many latent variables.
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Fast Langevin based algorithm for MCMC in high dimensions
Authors:
Alain Durmus,
Gareth O. Roberts,
Gilles Vilmart,
Konstantinos C. Zygalakis
Abstract:
We introduce new Gaussian proposals to improve the efficiency of the standard Hastings-Metropolis algorithm in Markov chain Monte Carlo (MCMC) methods, used for the sampling from a target distribution in large dimension $d$. The improved complexity is $\mathcal{O}(d^{1/5})$ compared to the complexity $\mathcal{O}(d^{1/3})$ of the standard approach. We prove an asymptotic diffusion limit theorem an…
▽ More
We introduce new Gaussian proposals to improve the efficiency of the standard Hastings-Metropolis algorithm in Markov chain Monte Carlo (MCMC) methods, used for the sampling from a target distribution in large dimension $d$. The improved complexity is $\mathcal{O}(d^{1/5})$ compared to the complexity $\mathcal{O}(d^{1/3})$ of the standard approach. We prove an asymptotic diffusion limit theorem and show that the relative efficiency of the algorithm can be characterised by its overall acceptance rate (with asymptotical value 0.704), independently of the target distribution. Numerical experiments confirm our theoretical findings.
△ Less
Submitted 25 November, 2016; v1 submitted 8 July, 2015;
originally announced July 2015.
-
Data Assimilation: A Mathematical Introduction
Authors:
K. J. H. Law,
A. M. Stuart,
K. C. Zygalakis
Abstract:
These notes provide a systematic mathematical treatment of the subject of data assimilation.
These notes provide a systematic mathematical treatment of the subject of data assimilation.
△ Less
Submitted 25 June, 2015;
originally announced June 2015.
-
Entropy, Ergodicity and Stem Cell Multipotency
Authors:
Sonya J. Ridden,
Hannah H. Chang,
Konstantinos C. Zygalakis,
Ben D. MacArthur
Abstract:
Populations of mammalian stem cells commonly exhibit considerable cell-cell variability. However, the functional role of this diversity is unclear. Here, we analyze expression fluctuations of the stem cell surface marker Sca1 in mouse hematopoietic progenitor cells using a simple stochastic model and find that the observed dynamics naturally lie close to a critical state, thereby producing a diver…
▽ More
Populations of mammalian stem cells commonly exhibit considerable cell-cell variability. However, the functional role of this diversity is unclear. Here, we analyze expression fluctuations of the stem cell surface marker Sca1 in mouse hematopoietic progenitor cells using a simple stochastic model and find that the observed dynamics naturally lie close to a critical state, thereby producing a diverse population that is able to respond rapidly to environmental changes. We propose an information-theoretic interpretation of these results that views cellular multipotency as an instance of maximum entropy statistical inference.
△ Less
Submitted 16 October, 2015; v1 submitted 27 April, 2015;
originally announced April 2015.
-
(Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics
Authors:
Sebastian J. Vollmer,
Konstantinos C. Zygalakis,
and Yee Whye Teh
Abstract:
Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis-Hastings accept-reject step, and it uses sequences of decreasing step sizes. In \cite{TehT…
▽ More
Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis-Hastings accept-reject step, and it uses sequences of decreasing step sizes. In \cite{TehThierryVollmerSGLD2014}, we provided the mathematical foundations for the decreasing step size SGLD, including consistency and a central limit theorem. However, in practice the SGLD is run for a relatively small number of iterations, and its step size is not decreased to zero. The present article investigates the behaviour of the SGLD with fixed step size. In particular we characterise the asymptotic bias explicitly, along with its dependence on the step size and the variance of the stochastic gradient. On that basis a modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived. Moreover, we are able to obtain bounds on the finite-time bias, variance and mean squared error (MSE). The theory is illustrated with a Gaussian toy model for which the bias and the MSE for the estimation of moments can be obtained explicitly. For this toy model we study the gain of the SGLD over the standard Euler method in the limit of large data sets.
△ Less
Submitted 21 September, 2015; v1 submitted 2 January, 2015;
originally announced January 2015.
-
Accuracy and Stability of The Continuous-Time 3DVAR Filter for The Navier-Stokes Equation
Authors:
D. Bloemker,
K. J. H. Law,
A. M. Stuart,
K. C. Zygalakis
Abstract:
The 3DVAR filter is prototypical of methods used to combine observed data with a dynamical system, online, in order to improve estimation of the state of the system. Such methods are used for high dimensional data assimilation problems, such as those arising in weather forecasting. To gain understanding of filters in applications such as these, it is hence of interest to study their behaviour when…
▽ More
The 3DVAR filter is prototypical of methods used to combine observed data with a dynamical system, online, in order to improve estimation of the state of the system. Such methods are used for high dimensional data assimilation problems, such as those arising in weather forecasting. To gain understanding of filters in applications such as these, it is hence of interest to study their behaviour when applied to infinite dimensional dynamical systems. This motivates study of the problem of accuracy and stability of 3DVAR filters for the Navier-Stokes equation.
We work in the limit of high frequency observations and derive continuous time filters. This leads to a stochastic partial differential equation (SPDE) for state estimation, in the form of a damped-driven Navier-Stokes equation, with mean-reversion to the signal, and spatially-correlated time-white noise. Both forward and pullback accuracy and stability results are proved for this SPDE, showing in particular that when enough low Fourier modes are observed, and when the model uncertainty is larger than the data uncertainty in these modes (variance inflation), then the filter can lock on to a small neighbourhood of the true signal, recovering from order one initial error, if the error in the observations modes is small. Numerical examples are given to illustrate the theory.
△ Less
Submitted 13 October, 2012; v1 submitted 4 October, 2012;
originally announced October 2012.
-
Analysis of Brownian Dynamics Simulations of Reversible Bimolecular Reactions
Authors:
J. Lipkova,
K. C. Zygalakis,
S. J. Chapman,
R. Erban
Abstract:
A class of Brownian dynamics algorithms for stochastic reaction-diffusion models which include reversible bimolecular reactions is presented and analyzed. The method is a generalization of the $λ$--$\newrho$ model for irreversible bimolecular reactions which was introduced in [arXiv:0903.1298]. The formulae relating the experimentally measurable quantities (reaction rate constants and diffusion co…
▽ More
A class of Brownian dynamics algorithms for stochastic reaction-diffusion models which include reversible bimolecular reactions is presented and analyzed. The method is a generalization of the $λ$--$\newrho$ model for irreversible bimolecular reactions which was introduced in [arXiv:0903.1298]. The formulae relating the experimentally measurable quantities (reaction rate constants and diffusion constants) with the algorithm parameters are derived. The probability of geminate recombination is also investigated.
△ Less
Submitted 5 May, 2010;
originally announced May 2010.
-
Homogenization for advection-diffusion in a perforated domain
Authors:
P. H. Haynes,
V. H. Hoang,
J. R. Norris,
K. C. Zygalakis
Abstract:
The volume of a Wiener sausage constructed from a diffusion process with periodic, mean-zero, divergence-free velocity field, in dimension 3 or more, is shown to have a non-random and positive asymptotic rate of growth. This is used to establish the existence of a homogenized limit for such a diffusion when subject to Dirichlet conditions on the boundaries of a sparse and independent array of obst…
▽ More
The volume of a Wiener sausage constructed from a diffusion process with periodic, mean-zero, divergence-free velocity field, in dimension 3 or more, is shown to have a non-random and positive asymptotic rate of growth. This is used to establish the existence of a homogenized limit for such a diffusion when subject to Dirichlet conditions on the boundaries of a sparse and independent array of obstacles. There is a constant effective long-time loss rate at the obstacles. The dependence of this rate on the form and intensity of the obstacles and on the velocity field is investigated. A Monte Carlo algorithm for the computation of the volume growth rate of the sausage is introduced and some numerical results are presented for the Taylor--Green velocity field.
△ Less
Submitted 25 March, 2010; v1 submitted 21 March, 2010;
originally announced March 2010.
-
Calculating Effective Diffusivities in the Limit of Vanishing Molecular Diffusion
Authors:
G. A. Pavliotis,
A. M. Stuart,
K. C. Zygalakis
Abstract:
In this paper we study the problem of the numerical calculation (by Monte Carlo Methods) of the effective diffusivity for a particle moving in a periodic divergent-free velocity filed, in the limit of vanishing molecular diffusion. In this limit traditional numerical methods typically fail, since they do not represent accurately the geometry of the underlying deterministic dynamics. We propose a…
▽ More
In this paper we study the problem of the numerical calculation (by Monte Carlo Methods) of the effective diffusivity for a particle moving in a periodic divergent-free velocity filed, in the limit of vanishing molecular diffusion. In this limit traditional numerical methods typically fail, since they do not represent accurately the geometry of the underlying deterministic dynamics. We propose a stochastic splitting method that takes into account the volume preserving property of the equations motion in the absence of noise, and when inertial effects can be neglected. An extension of the method is then proposed for the cases where the noise has a non trivial time-correlation structure and when inertial effects cannot be neglected. Modified equations are used to perform backward error analysis. The new stochastic geometric integrators are shown to outperform standard Euler-based integrators. Various asymptotic limits of physical interest are investigated by means of numerical experiments, using the new integrators.
△ Less
Submitted 20 June, 2008;
originally announced June 2008.
-
Homogenization for Inertial Particles in a Random Flow
Authors:
G. A. Pavliotis,
A. M. Stuart,
K. C. Zygalakis
Abstract:
We study the problem of homogenization for inertial particles moving in a time dependent random velocity field and subject to molecular diffusion. We show that, under appropriate assumptions on the velocity field, the large--scale, long--time behavior of the inertial particles is governed by an effective diffusion equation for the position variable alone. This is achieved by the use of a formal…
▽ More
We study the problem of homogenization for inertial particles moving in a time dependent random velocity field and subject to molecular diffusion. We show that, under appropriate assumptions on the velocity field, the large--scale, long--time behavior of the inertial particles is governed by an effective diffusion equation for the position variable alone. This is achieved by the use of a formal multiple scales expansion in the scale parameter. The expansion relies on the hypoellipticity of the underlying diffusion. An expression for the diffusivity tensor is found and various of its properties are studied. The results of the formal multiscale analysis are justified rigorously by the use of the martingale central limit theorem. Our theoretical findings are supported by numerical investigations where we study the parametric dependence of the effective diffusivity on the various non--dimensional parameters of the problem.
△ Less
Submitted 8 February, 2007;
originally announced February 2007.