-
Mixing time of the conditional backward sampling particle filter
Authors:
Joona Karjalainen,
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
The conditional backward sampling particle filter (CBPF) is a powerful Markov chain Monte Carlo sampler for general state space hidden Markov model (HMM) smoothing. It was proposed as an improvement over the conditional particle filter (CPF), which is known to have an $O(T^2)$ computational time complexity under a general `strong' mixing assumption, where $T$ is the time horizon. While there is em…
▽ More
The conditional backward sampling particle filter (CBPF) is a powerful Markov chain Monte Carlo sampler for general state space hidden Markov model (HMM) smoothing. It was proposed as an improvement over the conditional particle filter (CPF), which is known to have an $O(T^2)$ computational time complexity under a general `strong' mixing assumption, where $T$ is the time horizon. While there is empirical evidence of the superiority of the CBPF over the CPF in practice, this has never been theoretically quantified. We show that the CBPF has $O(T \log T)$ time complexity under strong mixing. In particular, the CBPF's mixing time is upper bounded by $O(\log T)$, for any sufficiently large number of particles $N$ that depends only on the mixing assumptions and not $T$. We also show that an $O(\log T)$ mixing time is optimal. To prove our main result, we introduce a novel coupling of two CBPFs, which employs a maximal coupling of two particle systems at each time instant. As the coupling is implementable, it thus has practical applications. We use it to construct unbiased, finite variance, estimates of functionals which have arbitrary dependence on the latent state's path, with a total expected cost of $O(T \log T)$. As the specific application to real-data analysis, we construct unbiased estimates of the HMM's score function, leading to stochastic gradient maximum likelihood estimation of a financial time-series model. Finally, we also investigate other couplings and show that some of these alternatives can have improved empirical behaviour.
△ Less
Submitted 30 May, 2025; v1 submitted 29 December, 2023;
originally announced December 2023.
-
On the Forgetting of Particle Filters
Authors:
Joona Karjalainen,
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ 'time', where $N$ is the number of particles and time refers to the numb…
▽ More
We study the forgetting properties of the particle filter when its state - the collection of particles - is regarded as a Markov chain. Under a strong mixing assumption on the particle filter's underlying Feynman-Kac model, we find that the particle filter is exponentially mixing, and forgets its initial state in $O(\log N )$ 'time', where $N$ is the number of particles and time refers to the number of particle filter algorithm steps, each comprising a selection (or resampling) and mutation (or prediction) operation. We present an example which shows that this rate is optimal. In contrast to our result, available results to-date are extremely conservative, suggesting $O(α^N)$ time steps are needed, for some $α>1$, for the particle filter to forget its initialisation. We also study the conditional particle filter (CPF) and extend our forgetting result to this context. We establish a similar conclusion, namely, CPF is exponentially mixing and forgets its initial state in $O(\log N )$ time. To support this analysis, we establish new time-uniform $L^p$ error estimates for CPF, which can be of independent interest. We also establish new propagation of chaos type results using our proof techniques, discuss implications to couplings of particle filters and an application to processing out-of-sequence measurements.
△ Less
Submitted 5 February, 2025; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Bayesian Deep Learning with Multilevel Trace-class Neural Networks
Authors:
Neil K. Chada,
Ajay Jasra,
Kody J. H. Law,
Sumeetpal S. Singh
Abstract:
In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which can be preferable to traditional DNNs as (a) they are identifiable and (b) they possess desirable convergence properties. TNN priors are defined on functions with infinitely many hidden units, and have strongly convergent Karhunen-Loeve-type appr…
▽ More
In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which can be preferable to traditional DNNs as (a) they are identifiable and (b) they possess desirable convergence properties. TNN priors are defined on functions with infinitely many hidden units, and have strongly convergent Karhunen-Loeve-type approximations with finitely many hidden units. A practical hurdle is that the Bayesian solution is computationally demanding, requiring simulation methods, so approaches to drive down the complexity are needed. In this paper, we leverage the strong convergence of TNN in order to apply Multilevel Monte Carlo (MLMC) to these models. In particular, an MLMC method that was introduced is used to approximate posterior expectations of Bayesian TNN models with optimal computational complexity, and this is mathematically proved. The results are verified with several numerical experiments on model problems arising in machine learning, including regression, classification, and reinforcement learning.
△ Less
Submitted 3 May, 2025; v1 submitted 24 March, 2022;
originally announced March 2022.
-
On resampling schemes for particle filters with weakly informative observations
Authors:
Nicolas Chopin,
Sumeetpal S. Singh,
Tomás Soto,
Matti Vihola
Abstract:
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are ind…
▽ More
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined `infinitesimal generator.' By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling `dominate' stratified and independent `killing' resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman--Kac path integral models converges to a (uniformly weighted) continuous-time particle system.
△ Less
Submitted 9 July, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Backward It{ô}-Ventzell and stochastic interpolation formulae
Authors:
Pierre del Moral,
Sumeetpal Sidhu Singh
Abstract:
We present a novel backward It{ô}-Ventzell formula and an extension of the Aleeksev-Gröbner interpolating formula to stochastic flows. We also present some natural spectral conditions that yield direct and simple proofs of time uniform estimates of the difference between the two stochastic flows when their drift and diffusion functions are not the same, yielding what seems to be the first results…
▽ More
We present a novel backward It{ô}-Ventzell formula and an extension of the Aleeksev-Gröbner interpolating formula to stochastic flows. We also present some natural spectral conditions that yield direct and simple proofs of time uniform estimates of the difference between the two stochastic flows when their drift and diffusion functions are not the same, yielding what seems to be the first results of this type for this class of anticipative models. We illustrate the impact of these results in the context of diffusion perturbation theory, interacting diffusions and discrete time approximations
△ Less
Submitted 4 May, 2021; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Asymptotic Analysis of Model Selection Criteria for General Hidden Markov Models
Authors:
Shouto Yonekura,
Alexandros Beskos,
Sumeetpal S. Singh
Abstract:
The paper obtains analytical results for the asymptotic properties of Model Selection Criteria -- widely used in practice -- for a general family of hidden Markov models (HMMs), thereby substantially extending the related theory beyond typical i.i.d.-like model structures and filling in an important gap in the relevant literature. In particular, we look at the Bayesian and Akaike Information Crite…
▽ More
The paper obtains analytical results for the asymptotic properties of Model Selection Criteria -- widely used in practice -- for a general family of hidden Markov models (HMMs), thereby substantially extending the related theory beyond typical i.i.d.-like model structures and filling in an important gap in the relevant literature. In particular, we look at the Bayesian and Akaike Information Criteria (BIC and AIC) and the model evidence. In the setting of nested classes of models, we prove that BIC and the evidence are strongly consistent for HMMs (under regularity conditions), whereas AIC is not weakly consistent. Numerical experiments support our theoretical results.
△ Less
Submitted 30 March, 2020; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Coupled conditional backward sampling particle filter
Authors:
Anthony Lee,
Sumeetpal S. Singh,
Matti Vihola
Abstract:
The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effecti…
▽ More
The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effective with a fixed number of particles independent of the time horizon. Our result is based on analysis of a new coupling of two CBPFs, the coupled conditional backward sampling particle filter (CCBPF). We show that CCBPF has good stability properties in the sense that with fixed number of particles, the coupling time in terms of iterations increases only linearly with respect to the time horizon under a general (strong mixing) condition. The CCBPF is useful not only as a theoretical tool, but also as a practical method that allows for unbiased estimation of smoothing expectations, following the recent developments by Jacob et al. (to appear). Unbiased estimation has many advantages, such as enabling the construction of asymptotically exact confidence intervals and straightforward parallelisation.
△ Less
Submitted 28 August, 2019; v1 submitted 15 June, 2018;
originally announced June 2018.
-
On the loss of Fisher information in some multi-object tracking observation models
Authors:
Jeremie Houssineau,
Ajay Jasra,
Sumeetpal S. Singh
Abstract:
The concept of Fisher information can be useful even in cases where the probability distributions of interest are not absolutely continuous with respect to the natural reference measure on the underlying space. Practical examples where this extension is useful are provided in the context of multi-object tracking statistical models. Upon defining the Fisher information without introducing a referen…
▽ More
The concept of Fisher information can be useful even in cases where the probability distributions of interest are not absolutely continuous with respect to the natural reference measure on the underlying space. Practical examples where this extension is useful are provided in the context of multi-object tracking statistical models. Upon defining the Fisher information without introducing a reference measure, we provide remarkably concise proofs of the loss of Fisher information in some widely used multi-object tracking observation models.
△ Less
Submitted 26 March, 2018;
originally announced March 2018.
-
Identification of multi-object dynamical systems: consistency and Fisher information
Authors:
Jeremie Houssineau,
Sumeetpal S. Singh,
Ajay Jasra
Abstract:
Learning the model parameters of a multi-object dynamical system from partial and perturbed observations is a challenging task. Despite recent numerical advancements in learning these parameters, theoretical guarantees are extremely scarce. In this article, we study the identifiability of these parameters and the consistency of the corresponding maximum likelihood estimate (MLE) under assumptions…
▽ More
Learning the model parameters of a multi-object dynamical system from partial and perturbed observations is a challenging task. Despite recent numerical advancements in learning these parameters, theoretical guarantees are extremely scarce. In this article, we study the identifiability of these parameters and the consistency of the corresponding maximum likelihood estimate (MLE) under assumptions on the different components of the underlying multi-object system. In order to understand the impact of the various sources of observation noise on the ability to learn the model parameters, we study the asymptotic variance of the MLE through the associated Fisher information matrix. For example, we show that specific aspects of the multi-target tracking (MTT) problem such as detection failures and unknown data association lead to a loss of information which is quantified in special cases of interest.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Blocking Strategies and Stability of Particle Gibbs Samplers
Authors:
Sumeetpal S. Singh,
Fredrik Lindsten,
Eric Moulines
Abstract:
Sampling from the conditional (or posterior) probability distribution of the latent states of a Hidden Markov Model, given the realization of the observed process, is a non-trivial problem in the context of Markov Chain Monte Carlo. To do this Andrieu et al. (2010) constructed a Markov kernel which leaves this conditional distribution invariant using a Particle Filter. From a practitioner's point…
▽ More
Sampling from the conditional (or posterior) probability distribution of the latent states of a Hidden Markov Model, given the realization of the observed process, is a non-trivial problem in the context of Markov Chain Monte Carlo. To do this Andrieu et al. (2010) constructed a Markov kernel which leaves this conditional distribution invariant using a Particle Filter. From a practitioner's point of view, this Markov kernel attempts to mimic the act of sampling all the latent state variables as one block from the posterior distribution but for models where exact simulation is not possible. There are some recent theoretical results that establish the uniform ergodicity of this Markov kernel and that the mixing rate does not diminish provided the number of particles grows at least linearly with the number of latent states in the posterior. This gives rise to a cost, per application of the kernel, that is quadratic in the number of latent states which could be prohibitive for long observation sequences. We seek to answer an obvious but important question: is there a different implementation with a cost per-iteration that grows linearly with the number of latent states, but which is still stable in the sense that its mixing rate does not deteriorate? We address this problem using blocking strategies, which are easily parallelizable, and prove stability of the resulting sampler.
△ Less
Submitted 28 September, 2015;
originally announced September 2015.
-
Distributed Maximum Likelihood for Simultaneous Self-localization and Tracking in Sensor Networks
Authors:
Nikolas Kantas,
Sumeetpal S. Singh,
Arnaud Doucet
Abstract:
We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian models, our algorithms can be implemented exactly usin…
▽ More
We show that the sensor self-localization problem can be cast as a static parameter estimation problem for Hidden Markov Models and we implement fully decentralized versions of the Recursive Maximum Likelihood and on-line Expectation-Maximization algorithms to localize the sensor network simultaneously with target tracking. For linear Gaussian models, our algorithms can be implemented exactly using a distributed version of the Kalman filter and a novel message passing algorithm. The latter allows each node to compute the local derivatives of the likelihood or the sufficient statistics needed for Expectation-Maximization. In the non-linear case, a solution based on local linearization in the spirit of the Extended Kalman Filter is proposed. In numerical examples we demonstrate that the developed algorithms are able to learn the localization parameters.
△ Less
Submitted 19 June, 2012;
originally announced June 2012.
-
Asymptotic Behaviour of Approximate Bayesian Estimators
Authors:
Thomas A. Dean,
Sumeetpal S. Singh
Abstract:
Although approximate Bayesian computation (ABC) has become a popular technique for performing parameter estimation when the likelihood functions are analytically intractable there has not as yet been a complete investigation of the theoretical properties of the resulting estimators. In this paper we give a theoretical analysis of the asymptotic properties of ABC based parameter estimators for hidd…
▽ More
Although approximate Bayesian computation (ABC) has become a popular technique for performing parameter estimation when the likelihood functions are analytically intractable there has not as yet been a complete investigation of the theoretical properties of the resulting estimators. In this paper we give a theoretical analysis of the asymptotic properties of ABC based parameter estimators for hidden Markov models and show that ABC based estimators satisfy asymptotically biased versions of the standard results in the statistical literature.
△ Less
Submitted 18 May, 2011;
originally announced May 2011.
-
Parameter Estimation for Hidden Markov Models with Intractable Likelihoods
Authors:
Thomas A. Dean,
Sumeetpal S. Singh,
Ajay Jasra,
Gareth W. Peters
Abstract:
Approximate Bayesian computation (ABC) is a popular technique for approximating likelihoods and is often used in parameter estimation when the likelihood functions are analytically intractable. Although the use of ABC is widespread in many fields, there has been little investigation of the theoretical properties of the resulting estimators. In this paper we give a theoretical analysis of the asymp…
▽ More
Approximate Bayesian computation (ABC) is a popular technique for approximating likelihoods and is often used in parameter estimation when the likelihood functions are analytically intractable. Although the use of ABC is widespread in many fields, there has been little investigation of the theoretical properties of the resulting estimators. In this paper we give a theoretical analysis of the asymptotic properties of ABC based maximum likelihood parameter estimation for hidden Markov models. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how Sequential Monte Carlo methods provide a natural method for implementing likelihood based ABC procedures.
△ Less
Submitted 28 March, 2011;
originally announced March 2011.
-
A Backward Particle Interpretation of Feynman-Kac Formulae
Authors:
Pierre Del Moral,
Arnaud Doucet,
Sumeetpal S. Singh
Abstract:
We design a particle interpretation of Feynman-Kac measures on path spaces based on a backward Markovian representation combined with a traditional mean field particle interpretation of the flow of their final time marginals. In contrast to traditional genealogical tree based models, these new particle algorithms can be used to compute normalized additive functionals "on-the-fly" as well as thei…
▽ More
We design a particle interpretation of Feynman-Kac measures on path spaces based on a backward Markovian representation combined with a traditional mean field particle interpretation of the flow of their final time marginals. In contrast to traditional genealogical tree based models, these new particle algorithms can be used to compute normalized additive functionals "on-the-fly" as well as their limiting occupation measures with a given precision degree that does not depend on the final time horizon.
We provide uniform convergence results w.r.t. the time horizon parameter as well as functional central limit theorems and exponential concentration estimates. We also illustrate these results in the context of computational physics and imaginary time Schroedinger type partial differential equations, with a special interest in the numerical approximation of the invariant measure associated to $h$-processes.
△ Less
Submitted 18 August, 2009;
originally announced August 2009.