-
Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification
Authors:
Mario Refoyo,
David Luengo
Abstract:
Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assump…
▽ More
Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assumptions, rigid constraints on modifications, or lack of validity guarantees. This paper introduces Multi-SpaCE, a multi-objective counterfactual explanation method for multivariate time series. Using non-dominated ranking genetic algorithm II (NSGA-II), Multi-SpaCE balances proximity, sparsity, plausibility, and contiguity. Unlike most methods, it ensures perfect validity, supports multivariate data and provides a Pareto front of solutions, enabling flexibility to different end-user needs. Comprehensive experiments in diverse datasets demonstrate the ability of Multi-SpaCE to consistently achieve perfect validity and deliver superior performance compared to existing methods.
△ Less
Submitted 10 June, 2025; v1 submitted 14 December, 2024;
originally announced January 2025.
-
A Survey of Monte Carlo Methods for Parameter Estimation
Authors:
D. Luengo,
L. Martino,
M. Bugallo,
V. Elvira,
S. Särkkä
Abstract:
Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE)…
▽ More
Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE) estimators. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and the Monte Carlo (MC) methodology is one feasible approach. MC methods proceed by drawing random samples, either from the desired distribution or from a simpler one, and using them to compute consistent estimators. The most important families of MC algorithms are Markov chain MC (MCMC) and importance sampling (IS). On the one hand, MCMC methods draw samples from a proposal density, building then an ergodic Markov chain whose stationary distribution is the desired distribution by accepting or rejecting those candidate samples as the new state of the chain. On the other hand, IS techniques draw samples from a simple proposal density, and then assign them suitable weights that measure their quality in some appropriate way. In this paper, we perform a thorough review of MC methods for the estimation of static parameters in signal processing applications. A historical note on the development of MC schemes is also provided, followed by the basic MC method and a brief description of the rejection sampling (RS) algorithm, as well as three sections describing many of the most relevant MCMC and IS algorithms, and their combined use.
△ Less
Submitted 25 July, 2021;
originally announced July 2021.
-
Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions
Authors:
Daniel Heestermans Svendsen,
Maria Piles,
Jordi Muñoz-Marí,
David Luengo,
Luca Martino,
Gustau Camps-Valls
Abstract:
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itse…
▽ More
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approaches can address all these issues efficiently. We introduce Gaussian process (GP) convolution models for hybrid modelling in Earth observation (EO) problems. We specifically propose the use of a class of GP convolution models called latent force models (LFMs) for EO time series modelling, analysis and understanding. LFMs are hybrid models that incorporate physical knowledge encoded in differential equations into a multioutput GP model. LFMs can transfer information across time-series, cope with missing observations, infer explicit latent functions forcing the system, and learn parameterizations which are very helpful for system analysis and interpretability. We consider time series of soil moisture from active (ASCAT) and passive (SMOS, AMSR2) microwave satellites. We show how assuming a first order differential equation as governing equation, the model automatically estimates the e-folding time or decay rate related to soil moisture persistence and discovers latent forces related to precipitation. The proposed hybrid methodology reconciles the two main approaches in remote sensing parameter estimation by blending statistical learning and mechanistic modeling.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Physics-Aware Gaussian Processes in Remote Sensing
Authors:
Gustau Camps-Valls,
Luca Martino,
Daniel H. Svendsen,
Manuel Campos-Taberner,
Jordi Muñoz-Marí,
Valero Laparra,
David Luengo,
Francisco Javier García-Haro
Abstract:
Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates. However, GPs…
▽ More
Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve predictions and understanding. In this work, we review three GP models that respect and learn the physics of the underlying processes in the context of both forward and inverse modeling. After reviewing the traditional application of GPs for parameter retrieval, we introduce a Joint GP (JGP) model that combines in situ measurements and simulated data in a single GP model. Then, we present a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical constraints of the system governing equations. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Finally, we present an Automatic Gaussian Process Emulator (AGAPE) that approximates the forward physical model using concepts from Bayesian optimization and at the same time builds an optimally compact look-up-table for inversion. We give empirical evidence of the performance of these models through illustrative examples of vegetation monitoring and atmospheric modeling.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Heretical Multiple Importance Sampling
Authors:
Víctor Elvira,
Luca Martino,
David Luengo,
Mónica F. Bugallo
Abstract:
Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computa…
▽ More
Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computational cost. A recent work has shown that it is possible to achieve a trade-off between variance reduction and computational effort by performing an a priori random clustering of the proposals (partial DM algorithm). In this paper, we propose a novel "heretical" MIS framework, where the clustering is performed a posteriori with the goal of reducing the variance of the importance sampling weights. This approach yields biased estimators with a potentially large reduction in variance. Numerical examples show that heretical MIS estimators can outperform, in terms of mean squared error (MSE), both the standard and the partial MIS estimators, achieving a performance close to that of DM with less computational cost.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
Improving Population Monte Carlo: Alternative Weighting and Resampling Schemes
Authors:
Víctor Elvira,
Luca Martino,
David Luengo,
Mónica F. Bugallo
Abstract:
Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functio…
▽ More
Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functions for the samples and the avoidance of the sample degeneracy. In this paper, we propose three new schemes that considerably improve the performance of the original PMC formulation by allowing for better exploration of the space of unknowns and by selecting more adequately the surviving samples. A theoretical analysis is performed, proving the superiority of the novel schemes in terms of variance of the associated estimators and preservation of the sample diversity. Furthermore, we show that they outperform other state of the art algorithms (both in terms of mean square error and robustness w.r.t. initialization) through extensive numerical simulations.
△ Less
Submitted 10 July, 2016;
originally announced July 2016.
-
Generalized Multiple Importance Sampling
Authors:
Víctor Elvira,
Luca Martino,
David Luengo,
Mónica F. Bugallo
Abstract:
Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Un…
▽ More
Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.
△ Less
Submitted 3 November, 2019; v1 submitted 10 November, 2015;
originally announced November 2015.
-
Parallel Metropolis chains with cooperative adaptation
Authors:
L. Martino,
V. Elvira,
D. Luengo,
F. Louzada
Abstract:
Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) algorithms, have become very popular in signal processing over the last years. In this work, we introduce a novel MCMC scheme where parallel MCMC chains interact, adapting cooperatively the parameters of their proposal functions. Furthermore, the novel algorithm distributes the computational effort adaptively, rewarding the chains which…
▽ More
Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) algorithms, have become very popular in signal processing over the last years. In this work, we introduce a novel MCMC scheme where parallel MCMC chains interact, adapting cooperatively the parameters of their proposal functions. Furthermore, the novel algorithm distributes the computational effort adaptively, rewarding the chains which are providing better performance and, possibly even stopping other ones. These extinct chains can be reactivated if the algorithm considers necessary. Numerical simulations shows the benefits of the novel scheme.
△ Less
Submitted 26 September, 2015;
originally announced September 2015.
-
Orthogonal parallel MCMC methods for sampling and optimization
Authors:
L. Martino,
V. Elvira,
D. Luengo,
J. Corander,
F. Louzada
Abstract:
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introd…
▽ More
Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where a set of "vertical" parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters.
△ Less
Submitted 25 September, 2016; v1 submitted 30 July, 2015;
originally announced July 2015.
-
Efficient Multiple Importance Sampling Estimators
Authors:
Víctor Elvira,
Luca Martino,
David Luengo,
Mónica F. Bugallo
Abstract:
Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. Each sample is then assigned an importance weight that can be obtained according to different strategies. This work is motivated by the trade-off between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weigh…
▽ More
Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. Each sample is then assigned an importance weight that can be obtained according to different strategies. This work is motivated by the trade-off between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weight calculation. A new method that achieves an efficient compromise between both factors is introduced in this paper. It is based on forming a partition of the set of proposal distributions and computing the weights accordingly. Computer simulations show the excellent performance of the associated \mbox{\emph{partial deterministic mixture} MIS estimator.
△ Less
Submitted 20 May, 2015;
originally announced May 2015.
-
Layered Adaptive Importance Sampling
Authors:
L. Martino,
V. Elvira,
D. Luengo,
J. Corander
Abstract:
Monte Carlo methods represent the "de facto" standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal d…
▽ More
Monte Carlo methods represent the "de facto" standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.
△ Less
Submitted 27 November, 2016; v1 submitted 18 May, 2015;
originally announced May 2015.
-
Scalable Multi-Output Label Prediction: From Classifier Chains to Classifier Trellises
Authors:
J. Read,
L. Martino,
P. Olmos,
D. Luengo
Abstract:
Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain method…
▽ More
Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain methods have been introduced, and many of them perform very competitively across a wide range of benchmark datasets. However, scalability limitations become apparent on larger datasets when modeling a fully-cascaded chain. In particular, the methods' strategies for discovering and modeling a good chain structure constitutes a mayor computational bottleneck. In this paper, we present the classifier trellis (CT) method for scalable multi-label classification. We compare CT with several recently proposed classifier chain methods to show that it occupies an important niche: it is highly competitive on standard multi-label problems, yet it can also scale up to thousands or even tens of thousands of labels.
△ Less
Submitted 20 January, 2015;
originally announced January 2015.
-
Adaptive Independent Sticky MCMC algorithms
Authors:
L. Martino,
R. Casarin,
F. Leisen,
D. Luengo
Abstract:
In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function (pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built usin…
▽ More
In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function (pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built using interpolation procedures based on a set of support points which is constructed iteratively based on previously drawn samples. The algorithm's efficiency is ensured by a test that controls the evolution of the set of support points. This extra stage controls the computational cost and the convergence of the proposal density to the target. Each part of the novel family of algorithms is discussed and several examples are provided. Although the novel algorithms are presented for univariate target densities, we show that they can be easily extended to the multivariate context within a Gibbs-type sampler. The ergodicity is ensured and discussed. Exhaustive numerical examples illustrate the efficiency of sticky schemes, both as a stand-alone methods to sample from complicated one-dimensional pdfs and within Gibbs in order to draw from multi-dimensional target distributions.
△ Less
Submitted 2 January, 2016; v1 submitted 17 August, 2013;
originally announced August 2013.
-
Extremely efficient generation of Gamma random variables for α>= 1
Authors:
Luca Martino,
David Luengo
Abstract:
The Gamma distribution is well-known and widely used in many signal processing and communications applications. In this letter, a simple and extremely efficient accept/reject algorithm is introduced for the generation of independent random variables from a Gamma distribution with any shape parameter α>= 1. The proposed method uses another Gamma distribution with integer α_p <= α, from which sample…
▽ More
The Gamma distribution is well-known and widely used in many signal processing and communications applications. In this letter, a simple and extremely efficient accept/reject algorithm is introduced for the generation of independent random variables from a Gamma distribution with any shape parameter α>= 1. The proposed method uses another Gamma distribution with integer α_p <= α, from which samples can be easily drawn, as proposal function. For this reason, the new technique attains a higher acceptance rate (AR) for α>= 3 than all the methods currently available in the literature, with AR tends to 1 as α diverges.
△ Less
Submitted 25 June, 2013; v1 submitted 13 April, 2013;
originally announced April 2013.
-
Blind Analysis of EGM Signals: Sparsity-Aware Formulation
Authors:
David Luengo,
Javier Via,
Sandra Monzon,
Tom Trigano,
Antonio Artes-Rodriguez
Abstract:
This technical note considers the problems of blind sparse learning and inference of electrogram (EGM) signals under atrial fibrillation (AF) conditions. First of all we introduce a mathematical model for the observed signals that takes into account the multiple foci typically appearing inside the heart during AF. Then we propose a reconstruction model based on a fixed dictionary and discuss sever…
▽ More
This technical note considers the problems of blind sparse learning and inference of electrogram (EGM) signals under atrial fibrillation (AF) conditions. First of all we introduce a mathematical model for the observed signals that takes into account the multiple foci typically appearing inside the heart during AF. Then we propose a reconstruction model based on a fixed dictionary and discuss several alternatives for choosing the dictionary. In order to obtain a sparse solution that takes into account the biological restrictions of the problem, a first alternative is using LASSO regularization followed by a post-processing stage that removes low amplitude coefficients violating the refractory period characteristic of cardiac cells. As an alternative we propose a novel regularization term, called cross products LASSO (CP-LASSO), that is able to incorporate the biological constraints directly into the optimization problem. Unfortunately, the resulting problem is non-convex, but we show how it can be solved efficiently in an approximated way making use of successive convex approximations (SCA). Finally, spectral analysis is performed on the clean activation sequence obtained from the sparse learning stage in order to estimate the number of latent foci and their frequencies. Simulations on synthetic and real data are provided to validate the proposed approach.
△ Less
Submitted 31 December, 2012;
originally announced December 2012.
-
Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm
Authors:
David Luengo,
Luca Martino
Abstract:
Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multi-modal and multi-dimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean v…
▽ More
Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multi-modal and multi-dimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean vectors and covariance matrices) updated using all the previously generated samples applying simple recursive rules. Numerical results for the one and two-dimensional cases are provided.
△ Less
Submitted 15 March, 2013; v1 submitted 1 December, 2012;
originally announced December 2012.
-
Efficient Monte Carlo Methods for Multi-Dimensional Learning with Classifier Chains
Authors:
Jesse Read,
Luca Martino,
David Luengo
Abstract:
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance - at the expense of an increased computational cost. In t…
▽ More
Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance - at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest- performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.
△ Less
Submitted 7 September, 2013; v1 submitted 9 November, 2012;
originally announced November 2012.
-
Improved Adaptive Rejection Metropolis Sampling Algorithms
Authors:
Luca Martino,
Jesse Read,
David Luengo
Abstract:
Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important issues for any MCMC method is the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dime…
▽ More
Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important issues for any MCMC method is the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dimensional target densities making use of adaptive piecewise proposals constructed using support points taken from rejected samples. In this work we pinpoint a crucial drawback in the adaptive procedure in ARMS: support points might never be added inside regions where the proposal is below the target. When this happens in many regions it leads to a poor performance of ARMS, with the proposal never converging to the target. In order to overcome this limitation we propose two improved adaptive schemes for constructing the proposal. The first one is a direct modification of the ARMS procedure that incorporates support points inside regions where the proposal is below the target, while satisfying the diminishing adaptation property, one of the required conditions to assure the convergence of the Markov chain. The second one is an adaptive independent MH algorithm with the ability to learn from all previous samples except for the current state of the chain, thus also guaranteeing the convergence to the invariant density. These two new schemes improve the adaptive strategy of ARMS, thus simplifying the complexity in the construction of the proposals. Numerical results show that the new techniques provide better performance w.r.t. the standard ARMS.
△ Less
Submitted 8 October, 2012; v1 submitted 24 May, 2012;
originally announced May 2012.
-
On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling
Authors:
Luca Martino,
David Luengo,
Joaquín Míguez
Abstract:
In this work we investigate the relationship among three classical sampling techniques: the inverse of density (Khintchine's theorem), the transformed rejection (TR) and the generalized ratio of uniforms (GRoU). Given a monotonic probability density function (PDF), we show that the transformed area obtained using the generalized ratio of uniforms method can be found equivalently by applying the tr…
▽ More
In this work we investigate the relationship among three classical sampling techniques: the inverse of density (Khintchine's theorem), the transformed rejection (TR) and the generalized ratio of uniforms (GRoU). Given a monotonic probability density function (PDF), we show that the transformed area obtained using the generalized ratio of uniforms method can be found equivalently by applying the transformed rejection sampling approach to the inverse function of the target density. Then we provide an extension of the classical inverse of density idea, showing that it is completely equivalent to the GRoU method for monotonic densities. Although we concentrate on monotonic probability density functions (PDFs), we also discuss how the results presented here can be extended to any non-monotonic PDF that can be decomposed into a collection of intervals where it is monotonically increasing or decreasing. In this general case, we show the connections with transformations of certain random variables and the generalized inverse PDF with the GRoU technique. Finally, we also introduce a GRoU technique to handle unbounded target densities.
△ Less
Submitted 16 July, 2013; v1 submitted 2 May, 2012;
originally announced May 2012.
-
Linear Latent Force Models using Gaussian Processes
Authors:
Mauricio A. Álvarez,
David Luengo,
Neil D. Lawrence
Abstract:
Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper…
▽ More
Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.
△ Less
Submitted 13 March, 2020; v1 submitted 13 July, 2011;
originally announced July 2011.
-
Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian Processes
Authors:
Mauricio A. Álvarez,
David Luengo,
Michalis K. Titsias,
Neil D. Lawrence
Abstract:
Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient in…
▽ More
Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Alvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.
△ Less
Submitted 16 December, 2009;
originally announced December 2009.