Search | arXiv e-print repository

arXiv:2501.04009 [pdf, ps, other]

Multi-SpaCE: Multi-Objective Subsequence-based Sparse Counterfactual Explanations for Multivariate Time Series Classification

Authors: Mario Refoyo, David Luengo

Abstract: Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assump… ▽ More Deep Learning systems excel in complex tasks but often lack transparency, limiting their use in critical applications. Counterfactual explanations, a core tool within eXplainable Artificial Intelligence (XAI), offer insights into model decisions by identifying minimal changes to an input to alter its predicted outcome. However, existing methods for time series data are limited by univariate assumptions, rigid constraints on modifications, or lack of validity guarantees. This paper introduces Multi-SpaCE, a multi-objective counterfactual explanation method for multivariate time series. Using non-dominated ranking genetic algorithm II (NSGA-II), Multi-SpaCE balances proximity, sparsity, plausibility, and contiguity. Unlike most methods, it ensures perfect validity, supports multivariate data and provides a Pareto front of solutions, enabling flexibility to different end-user needs. Comprehensive experiments in diverse datasets demonstrate the ability of Multi-SpaCE to consistently achieve perfect validity and deliver superior performance compared to existing methods. △ Less

Submitted 10 June, 2025; v1 submitted 14 December, 2024; originally announced January 2025.

arXiv:2107.11820 [pdf, other]

doi 10.1186/s13634-020-00675-6

A Survey of Monte Carlo Methods for Parameter Estimation

Authors: D. Luengo, L. Martino, M. Bugallo, V. Elvira, S. Särkkä

Abstract: Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE)… ▽ More Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE) estimators. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and the Monte Carlo (MC) methodology is one feasible approach. MC methods proceed by drawing random samples, either from the desired distribution or from a simpler one, and using them to compute consistent estimators. The most important families of MC algorithms are Markov chain MC (MCMC) and importance sampling (IS). On the one hand, MCMC methods draw samples from a proposal density, building then an ergodic Markov chain whose stationary distribution is the desired distribution by accepting or rejecting those candidate samples as the new state of the chain. On the other hand, IS techniques draw samples from a simple proposal density, and then assign them suitable weights that measure their quality in some appropriate way. In this paper, we perform a thorough review of MC methods for the estimation of static parameters in signal processing applications. A historical note on the development of MC schemes is also provided, followed by the basic MC method and a brief description of the rejection sampling (RS) algorithm, as well as three sections describing many of the most relevant MCMC and IS algorithms, and their combined use. △ Less

Submitted 25 July, 2021; originally announced July 2021.

Journal ref: EURASIP Journal on Advances in Signal Processing, Volume 2020, Article number: 25 (2020)

arXiv:2104.08134 [pdf, other]

doi 10.1109/TGRS.2021.3059550

Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions

Authors: Daniel Heestermans Svendsen, Maria Piles, Jordi Muñoz-Marí, David Luengo, Luca Martino, Gustau Camps-Valls

Abstract: The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itse… ▽ More The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approaches can address all these issues efficiently. We introduce Gaussian process (GP) convolution models for hybrid modelling in Earth observation (EO) problems. We specifically propose the use of a class of GP convolution models called latent force models (LFMs) for EO time series modelling, analysis and understanding. LFMs are hybrid models that incorporate physical knowledge encoded in differential equations into a multioutput GP model. LFMs can transfer information across time-series, cope with missing observations, infer explicit latent functions forcing the system, and learn parameterizations which are very helpful for system analysis and interpretability. We consider time series of soil moisture from active (ASCAT) and passive (SMOS, AMSR2) microwave satellites. We show how assuming a first order differential equation as governing equation, the model automatically estimates the e-folding time or decay rate related to soil moisture persistence and discovers latent forces related to precipitation. The proposed hybrid methodology reconciles the two main approaches in remote sensing parameter estimation by blending statistical learning and mechanistic modeling. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-15, 2022, Art no. 4401715

arXiv:2012.07986 [pdf, other]

doi 10.1016/j.asoc.2018.03.021

Physics-Aware Gaussian Processes in Remote Sensing

Authors: Gustau Camps-Valls, Luca Martino, Daniel H. Svendsen, Manuel Campos-Taberner, Jordi Muñoz-Marí, Valero Laparra, David Luengo, Francisco Javier García-Haro

Abstract: Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates. However, GPs… ▽ More Earth observation from satellite sensory data poses challenging problems, where machine learning is currently a key player. In recent years, Gaussian Process (GP) regression has excelled in biophysical parameter estimation tasks from airborne and satellite observations. GP regression is based on solid Bayesian statistics and generally yields efficient and accurate parameter estimates. However, GPs are typically used for inverse modeling based on concurrent observations and in situ measurements only. Very often a forward model encoding the well-understood physical relations between the state vector and the radiance observations is available though and could be useful to improve predictions and understanding. In this work, we review three GP models that respect and learn the physics of the underlying processes in the context of both forward and inverse modeling. After reviewing the traditional application of GPs for parameter retrieval, we introduce a Joint GP (JGP) model that combines in situ measurements and simulated data in a single GP model. Then, we present a latent force model (LFM) for GP modeling that encodes ordinary differential equations to blend data-driven modeling and physical constraints of the system governing equations. The LFM performs multi-output regression, adapts to the signal characteristics, is able to cope with missing data in the time series, and provides explicit latent functions that allow system analysis and evaluation. Finally, we present an Automatic Gaussian Process Emulator (AGAPE) that approximates the forward physical model using concepts from Bayesian optimization and at the same time builds an optimally compact look-up-table for inversion. We give empirical evidence of the performance of these models through illustrative examples of vegetation monitoring and atmospheric modeling. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Journal ref: Applied Soft Computing Volume 68, July 2018, Pages 69-82

arXiv:1609.04740 [pdf, other]

doi 10.1109/LSP.2016.2600678

Heretical Multiple Importance Sampling

Authors: Víctor Elvira, Luca Martino, David Luengo, Mónica F. Bugallo

Abstract: Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computa… ▽ More Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computational cost. A recent work has shown that it is possible to achieve a trade-off between variance reduction and computational effort by performing an a priori random clustering of the proposals (partial DM algorithm). In this paper, we propose a novel "heretical" MIS framework, where the clustering is performed a posteriori with the goal of reducing the variance of the importance sampling weights. This approach yields biased estimators with a potentially large reduction in variance. Numerical examples show that heretical MIS estimators can outperform, in terms of mean squared error (MSE), both the standard and the partial MIS estimators, achieving a performance close to that of DM with less computational cost. △ Less

Submitted 15 September, 2016; originally announced September 2016.

Comments: 8 pages, 2 figures

Journal ref: IEEE Signal Processing Letter, Volume 23, Issue 10, October 2016

arXiv:1607.02758 [pdf, other]

doi 10.1016/j.sigpro.2016.07.012

Improving Population Monte Carlo: Alternative Weighting and Resampling Schemes

Authors: Víctor Elvira, Luca Martino, David Luengo, Mónica F. Bugallo

Abstract: Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functio… ▽ More Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functions for the samples and the avoidance of the sample degeneracy. In this paper, we propose three new schemes that considerably improve the performance of the original PMC formulation by allowing for better exploration of the space of unknowns and by selecting more adequately the surviving samples. A theoretical analysis is performed, proving the superiority of the novel schemes in terms of variance of the associated estimators and preservation of the sample diversity. Furthermore, we show that they outperform other state of the art algorithms (both in terms of mean square error and robustness w.r.t. initialization) through extensive numerical simulations. △ Less

Submitted 10 July, 2016; originally announced July 2016.

Comments: Signal Processing, 2016

Journal ref: Signal Processing Volume 131, February 2017, Pages 77-91

arXiv:1511.03095 [pdf, other]

doi 10.1214/18-STS668

Generalized Multiple Importance Sampling

Authors: Víctor Elvira, Luca Martino, David Luengo, Mónica F. Bugallo

Abstract: Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Un… ▽ More Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighing procedures when more than one proposal are available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method. △ Less

Submitted 3 November, 2019; v1 submitted 10 November, 2015; originally announced November 2015.

Journal ref: Statistical Science, Volume 34, Number 1 (2019), 129-155

arXiv:1509.07993 [pdf, other]

doi 10.1109/ICASSP.2016.7472423

Parallel Metropolis chains with cooperative adaptation

Authors: L. Martino, V. Elvira, D. Luengo, F. Louzada

Abstract: Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) algorithms, have become very popular in signal processing over the last years. In this work, we introduce a novel MCMC scheme where parallel MCMC chains interact, adapting cooperatively the parameters of their proposal functions. Furthermore, the novel algorithm distributes the computational effort adaptively, rewarding the chains which… ▽ More Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) algorithms, have become very popular in signal processing over the last years. In this work, we introduce a novel MCMC scheme where parallel MCMC chains interact, adapting cooperatively the parameters of their proposal functions. Furthermore, the novel algorithm distributes the computational effort adaptively, rewarding the chains which are providing better performance and, possibly even stopping other ones. These extinct chains can be reactivated if the algorithm considers necessary. Numerical simulations shows the benefits of the novel scheme. △ Less

Submitted 26 September, 2015; originally announced September 2015.

arXiv:1507.08577 [pdf, other]

doi 10.1016/j.dsp.2016.07.013

Orthogonal parallel MCMC methods for sampling and optimization

Authors: L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

Abstract: Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introd… ▽ More Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where a set of "vertical" parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters. △ Less

Submitted 25 September, 2016; v1 submitted 30 July, 2015; originally announced July 2015.

Journal ref: Digital Signal Processing Volume 58, Pages: 64-84, 2016

arXiv:1505.05391 [pdf, other]

doi 10.1109/LSP.2015.2432078

Efficient Multiple Importance Sampling Estimators

Authors: Víctor Elvira, Luca Martino, David Luengo, Mónica F. Bugallo

Abstract: Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. Each sample is then assigned an importance weight that can be obtained according to different strategies. This work is motivated by the trade-off between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weigh… ▽ More Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. Each sample is then assigned an importance weight that can be obtained according to different strategies. This work is motivated by the trade-off between variance reduction and computational complexity of the different approaches (classical vs. deterministic mixture) available for the weight calculation. A new method that achieves an efficient compromise between both factors is introduced in this paper. It is based on forming a partition of the set of proposal distributions and computing the weights accordingly. Computer simulations show the excellent performance of the associated \mbox{\emph{partial deterministic mixture} MIS estimator. △ Less

Submitted 20 May, 2015; originally announced May 2015.

Journal ref: IEEE Signal Processing Letters, VOL. 22, NO. 10, OCTOBER 2015

arXiv:1505.04732 [pdf, other]

doi 10.1007/s11222-016-9642-5

Layered Adaptive Importance Sampling

Authors: L. Martino, V. Elvira, D. Luengo, J. Corander

Abstract: Monte Carlo methods represent the "de facto" standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal d… ▽ More Monte Carlo methods represent the "de facto" standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods. △ Less

Submitted 27 November, 2016; v1 submitted 18 May, 2015; originally announced May 2015.

Comments: Related Matlab codes: an iterative version at http://www.lucamartino.altervista.org/CODE_LAIS_v03.zip and a non-iterative version at http://www.lucamartino.altervista.org/LAIS_non_iterative_code.zip, Statistics and Computing, 2016

Journal ref: Statistics and Computing, Volume 27, pages 599-623, 2017

arXiv:1501.04870 [pdf, other]

doi 10.1016/j.patcog.2015.01.004

Scalable Multi-Output Label Prediction: From Classifier Chains to Classifier Trellises

Authors: J. Read, L. Martino, P. Olmos, D. Luengo

Abstract: Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain method… ▽ More Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain methods have been introduced, and many of them perform very competitively across a wide range of benchmark datasets. However, scalability limitations become apparent on larger datasets when modeling a fully-cascaded chain. In particular, the methods' strategies for discovering and modeling a good chain structure constitutes a mayor computational bottleneck. In this paper, we present the classifier trellis (CT) method for scalable multi-label classification. We compare CT with several recently proposed classifier chain methods to show that it occupies an important niche: it is highly competitive on standard multi-label problems, yet it can also scale up to thousands or even tens of thousands of labels. △ Less

Submitted 20 January, 2015; originally announced January 2015.

Comments: (accepted in Pattern Recognition)

Journal ref: Pattern Recognition, Volume 48, Issue 6, 2015, Pages 2096-2109

arXiv:1308.3779 [pdf, other]

doi 10.1186/s13634-017-0524-6

Adaptive Independent Sticky MCMC algorithms

Authors: L. Martino, R. Casarin, F. Leisen, D. Luengo

Abstract: In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function (pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built usin… ▽ More In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function (pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built using interpolation procedures based on a set of support points which is constructed iteratively based on previously drawn samples. The algorithm's efficiency is ensured by a test that controls the evolution of the set of support points. This extra stage controls the computational cost and the convergence of the proposal density to the target. Each part of the novel family of algorithms is discussed and several examples are provided. Although the novel algorithms are presented for univariate target densities, we show that they can be easily extended to the multivariate context within a Gibbs-type sampler. The ergodicity is ensured and discussed. Exhaustive numerical examples illustrate the efficiency of sticky schemes, both as a stand-alone methods to sample from complicated one-dimensional pdfs and within Gibbs in order to draw from multi-dimensional target distributions. △ Less

Submitted 2 January, 2016; v1 submitted 17 August, 2013; originally announced August 2013.

Comments: A preliminary Matlab code is provided at https://www.mathworks.com/matlabcentral/fileexchange/54701-adaptive-independent-sticky-metropolis--aism--algorithm

Journal ref: EURASIP Journal on Advances in Signal Processing, Volume 5, page 1-28, 2018

arXiv:1304.3800 [pdf, other]

Extremely efficient generation of Gamma random variables for α>= 1

Authors: Luca Martino, David Luengo

Abstract: The Gamma distribution is well-known and widely used in many signal processing and communications applications. In this letter, a simple and extremely efficient accept/reject algorithm is introduced for the generation of independent random variables from a Gamma distribution with any shape parameter α>= 1. The proposed method uses another Gamma distribution with integer α_p <= α, from which sample… ▽ More The Gamma distribution is well-known and widely used in many signal processing and communications applications. In this letter, a simple and extremely efficient accept/reject algorithm is introduced for the generation of independent random variables from a Gamma distribution with any shape parameter α>= 1. The proposed method uses another Gamma distribution with integer α_p <= α, from which samples can be easily drawn, as proposal function. For this reason, the new technique attains a higher acceptance rate (AR) for α>= 3 than all the methods currently available in the literature, with AR tends to 1 as α diverges. △ Less

Submitted 25 June, 2013; v1 submitted 13 April, 2013; originally announced April 2013.

arXiv:1212.6936 [pdf, other]

Blind Analysis of EGM Signals: Sparsity-Aware Formulation

Authors: David Luengo, Javier Via, Sandra Monzon, Tom Trigano, Antonio Artes-Rodriguez

Abstract: This technical note considers the problems of blind sparse learning and inference of electrogram (EGM) signals under atrial fibrillation (AF) conditions. First of all we introduce a mathematical model for the observed signals that takes into account the multiple foci typically appearing inside the heart during AF. Then we propose a reconstruction model based on a fixed dictionary and discuss sever… ▽ More This technical note considers the problems of blind sparse learning and inference of electrogram (EGM) signals under atrial fibrillation (AF) conditions. First of all we introduce a mathematical model for the observed signals that takes into account the multiple foci typically appearing inside the heart during AF. Then we propose a reconstruction model based on a fixed dictionary and discuss several alternatives for choosing the dictionary. In order to obtain a sparse solution that takes into account the biological restrictions of the problem, a first alternative is using LASSO regularization followed by a post-processing stage that removes low amplitude coefficients violating the refractory period characteristic of cardiac cells. As an alternative we propose a novel regularization term, called cross products LASSO (CP-LASSO), that is able to incorporate the biological constraints directly into the optimization problem. Unfortunately, the resulting problem is non-convex, but we show how it can be solved efficiently in an approximated way making use of successive convex approximations (SCA). Finally, spectral analysis is performed on the clean activation sequence obtained from the sparse learning stage in order to estimate the number of latent foci and their frequencies. Simulations on synthetic and real data are provided to validate the proposed approach. △ Less

Submitted 31 December, 2012; originally announced December 2012.

Comments: 29 pages, 1 figure

arXiv:1212.0122 [pdf, other]

doi 10.1109/ICASSP.2013.6638846

Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm

Authors: David Luengo, Luca Martino

Abstract: Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multi-modal and multi-dimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean v… ▽ More Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multi-modal and multi-dimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean vectors and covariance matrices) updated using all the previously generated samples applying simple recursive rules. Numerical results for the one and two-dimensional cases are provided. △ Less

Submitted 15 March, 2013; v1 submitted 1 December, 2012; originally announced December 2012.

arXiv:1211.2190 [pdf, other]

doi 10.1016/j.patcog.2013.10.006

Efficient Monte Carlo Methods for Multi-Dimensional Learning with Classifier Chains

Authors: Jesse Read, Luca Martino, David Luengo

Abstract: Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance - at the expense of an increased computational cost. In t… ▽ More Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance - at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest- performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets. △ Less

Submitted 7 September, 2013; v1 submitted 9 November, 2012; originally announced November 2012.

Comments: Submitted to Pattern Recognition

Journal ref: Pattern Recognition, Volume 47, Issue 3, Pages: 1535-1546, 2014

arXiv:1205.5494 [pdf, other]

doi 10.1109/TSP.2015.2420537

Improved Adaptive Rejection Metropolis Sampling Algorithms

Authors: Luca Martino, Jesse Read, David Luengo

Abstract: Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important issues for any MCMC method is the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dime… ▽ More Markov Chain Monte Carlo (MCMC) methods, such as the Metropolis-Hastings (MH) algorithm, are widely used for Bayesian inference. One of the most important issues for any MCMC method is the convergence of the Markov chain, which depends crucially on a suitable choice of the proposal density. Adaptive Rejection Metropolis Sampling (ARMS) is a well-known MH scheme that generates samples from one-dimensional target densities making use of adaptive piecewise proposals constructed using support points taken from rejected samples. In this work we pinpoint a crucial drawback in the adaptive procedure in ARMS: support points might never be added inside regions where the proposal is below the target. When this happens in many regions it leads to a poor performance of ARMS, with the proposal never converging to the target. In order to overcome this limitation we propose two improved adaptive schemes for constructing the proposal. The first one is a direct modification of the ARMS procedure that incorporates support points inside regions where the proposal is below the target, while satisfying the diminishing adaptation property, one of the required conditions to assure the convergence of the Markov chain. The second one is an adaptive independent MH algorithm with the ability to learn from all previous samples except for the current state of the chain, thus also guaranteeing the convergence to the invariant density. These two new schemes improve the adaptive strategy of ARMS, thus simplifying the complexity in the construction of the proposals. Numerical results show that the new techniques provide better performance w.r.t. the standard ARMS. △ Less

Submitted 8 October, 2012; v1 submitted 24 May, 2012; originally announced May 2012.

Comments: Matlab code provided in http://a2rms.sourceforge.net/

Journal ref: Independent Doubly Adaptive Rejection Metropolis Sampling Within Gibbs Sampling, IEEE Transactions on Signal Processing, Volume 63, Issue 12, Pages 3123-3138, 2015

arXiv:1205.0482 [pdf, other]

On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling

Authors: Luca Martino, David Luengo, Joaquín Míguez

Abstract: In this work we investigate the relationship among three classical sampling techniques: the inverse of density (Khintchine's theorem), the transformed rejection (TR) and the generalized ratio of uniforms (GRoU). Given a monotonic probability density function (PDF), we show that the transformed area obtained using the generalized ratio of uniforms method can be found equivalently by applying the tr… ▽ More In this work we investigate the relationship among three classical sampling techniques: the inverse of density (Khintchine's theorem), the transformed rejection (TR) and the generalized ratio of uniforms (GRoU). Given a monotonic probability density function (PDF), we show that the transformed area obtained using the generalized ratio of uniforms method can be found equivalently by applying the transformed rejection sampling approach to the inverse function of the target density. Then we provide an extension of the classical inverse of density idea, showing that it is completely equivalent to the GRoU method for monotonic densities. Although we concentrate on monotonic probability density functions (PDFs), we also discuss how the results presented here can be extended to any non-monotonic PDF that can be decomposed into a collection of intervals where it is monotonically increasing or decreasing. In this general case, we show the connections with transformations of certain random variables and the generalized inverse PDF with the GRoU technique. Finally, we also introduce a GRoU technique to handle unbounded target densities. △ Less

Submitted 16 July, 2013; v1 submitted 2 May, 2012; originally announced May 2012.

arXiv:1107.2699 [pdf, other]

Linear Latent Force Models using Gaussian Processes

Authors: Mauricio A. Álvarez, David Luengo, Neil D. Lawrence

Abstract: Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper… ▽ More Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics. △ Less

Submitted 13 March, 2020; v1 submitted 13 July, 2011; originally announced July 2011.

Comments: 20 pages, 2 figures. Extended technical report of the Conference Paper "Latent force models" in D. van Dyk and M. Welling (eds) Proceedings of the Twelfth International Workshop on Artificial Intelligence and Statistics, JMLR W&CP 5, Clearwater Beach, FL, pp 9--16

arXiv:0912.3268 [pdf, other]

Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian Processes

Authors: Mauricio A. Álvarez, David Luengo, Michalis K. Titsias, Neil D. Lawrence

Abstract: Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient in… ▽ More Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Alvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series. △ Less

Submitted 16 December, 2009; originally announced December 2009.

Comments: Technical report, 22 pages, 8 figures

Showing 1–21 of 21 results for author: Luengo, D