-
Gaussian Invariant Markov Chain Monte Carlo
Authors:
Michalis K. Titsias,
Angelos Alexopoulos,
Siran Liu,
Petros Dellaportas
Abstract:
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that…
▽ More
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Probabilistic Wind Power Forecasting via Non-Stationary Gaussian Processes
Authors:
Domniki Ladopoulou,
Dat Minh Hong,
Petros Dellaportas
Abstract:
Accurate probabilistic forecasting of wind power is essential for maintaining grid stability and enabling efficient integration of renewable energy sources. Gaussian Process (GP) models offer a principled framework for quantifying uncertainty; however, conventional approaches rely on stationary kernels, which are inadequate for modeling the inherently non-stationary nature of wind speed and power…
▽ More
Accurate probabilistic forecasting of wind power is essential for maintaining grid stability and enabling efficient integration of renewable energy sources. Gaussian Process (GP) models offer a principled framework for quantifying uncertainty; however, conventional approaches rely on stationary kernels, which are inadequate for modeling the inherently non-stationary nature of wind speed and power output. We propose a non-stationary GP framework that incorporates the generalized spectral mixture (GSM) kernel, enabling the model to capture time-varying patterns and heteroscedastic behaviors in wind speed and wind power data. We evaluate the performance of the proposed model on real-world SCADA data across short\mbox{-,} medium-, and long-term forecasting horizons. Compared to standard radial basis function and spectral mixture kernels, the GSM-based model outperforms, particularly in short-term forecasts. These results highlight the necessity of modeling non-stationarity in wind power forecasting and demonstrate the practical value of non-stationary GP models in operational settings.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Variance Reduction for the Independent Metropolis Sampler
Authors:
Siran Liu,
Petros Dellaportas,
Michalis K. Titsias
Abstract:
Assume that we would like to estimate the expected value of a function $F$ with respect to an intractable density $π$, which is specified up to some unknown normalising constant. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction comp…
▽ More
Assume that we would like to estimate the expected value of a function $F$ with respect to an intractable density $π$, which is specified up to some unknown normalising constant. We prove that if $π$ is close enough under KL divergence to another density $q$, an independent Metropolis sampler estimator that obtains samples from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than i.i.d.\ sampling from $π$. The control variates construction requires no extra computational effort but assumes that the expected value of $F$ under $q$ is analytically available. We illustrate this result by calculating the marginal likelihood in a linear regression model with prior-likelihood conflict and a non-conjugate prior. Furthermore, we propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced. We demonstrate its applicability in a Bayesian logistic and Gaussian process regression problems and we rigorously justify our asymptotic arguments under easily verifiable and essentially minimal conditions.
△ Less
Submitted 15 October, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Probabilistic Multi-Layer Perceptrons for Wind Farm Condition Monitoring
Authors:
Filippo Fiocchi,
Domna Ladopoulou,
Petros Dellaportas
Abstract:
We provide a condition monitoring system for wind farms, based on normal behaviour modelling using a probabilistic multi-layer perceptron with transfer learning via fine-tuning. The model predicts the output power of the wind turbine under normal behaviour based on features retrieved from supervisory control and data acquisition (SCADA) systems. Its advantages are that (i) it can be trained with S…
▽ More
We provide a condition monitoring system for wind farms, based on normal behaviour modelling using a probabilistic multi-layer perceptron with transfer learning via fine-tuning. The model predicts the output power of the wind turbine under normal behaviour based on features retrieved from supervisory control and data acquisition (SCADA) systems. Its advantages are that (i) it can be trained with SCADA data of at least a few years, (ii) it can incorporate all SCADA data of all wind turbines in a wind farm as features, (iii) it assumes that the output power follows a normal density with heteroscedastic variance and (iv) it can predict the output of one wind turbine by borrowing strength from the data of all other wind turbines in a farm. Probabilistic guidelines for condition monitoring are given via a cumulative sum (CUSUM) control chart, which is specifically designed based on a real-data classification exercise and, hence, is adapted to the needs of a wind farm. We illustrate the performance of our model in a real SCADA data example which provides evidence that it outperforms other probabilistic prediction models.
△ Less
Submitted 26 February, 2025; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Learning variational autoencoders via MCMC speed measures
Authors:
Marcel Hirt,
Vasileios Kreouzis,
Petros Dellaportas
Abstract:
Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximizing an Evidence Lower Bound (ELBO). There has been much progress in improving the expressiveness of the variational distribution to obtain tighter variational bounds and increased generative performance. Whilst previous work has leveraged Markov chain Monte Carlo (MCMC) methods…
▽ More
Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximizing an Evidence Lower Bound (ELBO). There has been much progress in improving the expressiveness of the variational distribution to obtain tighter variational bounds and increased generative performance. Whilst previous work has leveraged Markov chain Monte Carlo (MCMC) methods for the construction of variational densities, gradient-based methods for adapting the proposal distributions for deep latent variable models have received less attention. This work suggests an entropy-based adaptation for a short-run Metropolis-adjusted Langevin (MALA) or Hamiltonian Monte Carlo (HMC) chain while optimising a tighter variational bound to the log-evidence. Experiments show that this approach yields higher held-out log-likelihoods as well as improved generative metrics. Our implicit variational density can adapt to complicated posterior geometries of latent hierarchical representations arising in hierarchical VAEs.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
How Good are Low-Rank Approximations in Gaussian Process Regression?
Authors:
Constantinos Daskalakis,
Petros Dellaportas,
Aristeidis Panos
Abstract:
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as betwee…
▽ More
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
△ Less
Submitted 21 February, 2022; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Scalable Marked Point Processes for Exchangeable and Non-Exchangeable Event Sequences
Authors:
Aristeidis Panos,
Ioannis Kosmidis,
Petros Dellaportas
Abstract:
We adopt the interpretability offered by a parametric, Hawkes-process-inspired conditional probability mass function for the marks and apply variational inference techniques to derive a general and scalable inferential framework for marked point processes. The framework can handle both exchangeable and non-exchangeable event sequences with minimal tuning and without any pre-training. This contrast…
▽ More
We adopt the interpretability offered by a parametric, Hawkes-process-inspired conditional probability mass function for the marks and apply variational inference techniques to derive a general and scalable inferential framework for marked point processes. The framework can handle both exchangeable and non-exchangeable event sequences with minimal tuning and without any pre-training. This contrasts with many parametric and non-parametric state-of-the-art methods that typically require pre-training and/or careful tuning, and can only handle exchangeable event sequences. The framework's competitive computational and predictive performance against other state-of-the-art methods are illustrated through real data experiments. Its attractiveness for large-scale applications is demonstrated through a case study involving all events occurring in an English Premier League season.
△ Less
Submitted 19 February, 2023; v1 submitted 30 May, 2021;
originally announced May 2021.
-
How Good are Low-Rank Approximations in Gaussian Process Regression?
Authors:
Constantinos Daskalakis,
Petros Dellaportas,
Aristeidis Panos
Abstract:
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as betwee…
▽ More
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
△ Less
Submitted 14 December, 2021; v1 submitted 3 April, 2020;
originally announced April 2020.
-
Gradient-based Adaptive Markov Chain Monte Carlo
Authors:
Michalis K. Titsias,
Petros Dellaportas
Abstract:
We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our met…
▽ More
We introduce a gradient-based learning method to automatically adapt Markov chain Monte Carlo (MCMC) proposal distributions to intractable targets. We define a maximum entropy regularised objective function, referred to as generalised speed measure, which can be robustly optimised over the parameters of the proposal distribution by applying stochastic gradient optimisation. An advantage of our method compared to traditional adaptive MCMC methods is that the adaptation occurs even when candidate state values are rejected. This is a highly desirable property of any adaptation strategy because the adaptation starts in early iterations even if the initial proposal distribution is far from optimum. We apply the framework for learning multivariate random walk Metropolis and Metropolis-adjusted Langevin proposals with full covariance matrices, and provide empirical evidence that our method can outperform other MCMC algorithms, including Hamiltonian Monte Carlo schemes.
△ Less
Submitted 6 January, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Copula-like Variational Inference
Authors:
Marcel Hirt,
Petros Dellaportas,
Alain Durmus
Abstract:
This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities use…
▽ More
This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension of state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity $\mathcal{O}(d \log d)$. We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.
△ Less
Submitted 22 December, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Fully Scalable Gaussian Processes using Subspace Inducing Inputs
Authors:
Aristeidis Panos,
Petros Dellaportas,
Michalis K. Titsias
Abstract:
We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions th…
▽ More
We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions that lead to simplified and numerically stable variational lower bounds. Our illustrative applications are based on challenging extreme multi-label classification problems with the extra burden of the very large number of class labels. We demonstrate the usefulness of our approach by presenting predictive performances together with low computational times in datasets with extremely large number of instances and input dimensions.
△ Less
Submitted 12 July, 2018; v1 submitted 6 July, 2018;
originally announced July 2018.
-
Scalable Bayesian Learning for State Space Models using Variational Inference with SMC Samplers
Authors:
Marcel Hirt,
Petros Dellaportas
Abstract:
We present a scalable approach to performing approximate fully Bayesian inference in generic state space models. The proposed method is an alternative to particle MCMC that provides fully Bayesian inference of both the dynamic latent states and the static parameters of the model. We build up on recent advances in computational statistics that combine variational methods with sequential Monte Carlo…
▽ More
We present a scalable approach to performing approximate fully Bayesian inference in generic state space models. The proposed method is an alternative to particle MCMC that provides fully Bayesian inference of both the dynamic latent states and the static parameters of the model. We build up on recent advances in computational statistics that combine variational methods with sequential Monte Carlo sampling and we demonstrate the advantages of performing full Bayesian inference over the static parameters rather than just performing variational EM approximations. We illustrate how our approach enables scalable inference in multivariate stochastic volatility models and self-exciting point process models that allow for flexible dynamics in the latent intensity function.
△ Less
Submitted 12 February, 2019; v1 submitted 23 May, 2018;
originally announced May 2018.