-
Quantifying the Speed-Up from Non-Reversibility in MCMC Tempering Algorithms
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We investigate the increase in efficiency of simulated and parallel tempering MCMC algorithms when using non-reversible updates to give them "momentum". By making a connection to a certain simple discrete Markov chain, we show that, under appropriate assumptions, the non-reversible algorithms still exhibit diffusive behaviour, just on a different time scale. We use this to argue that the optimally…
▽ More
We investigate the increase in efficiency of simulated and parallel tempering MCMC algorithms when using non-reversible updates to give them "momentum". By making a connection to a certain simple discrete Markov chain, we show that, under appropriate assumptions, the non-reversible algorithms still exhibit diffusive behaviour, just on a different time scale. We use this to argue that the optimally scaled versions of the non-reversible algorithms are indeed more efficient than the optimally scaled versions of their traditional reversible counterparts, but only by a modest speed-up factor of about 42%.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Upper and lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo
Authors:
Austin Brown,
Jeffrey S. Rosenthal
Abstract:
We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matc…
▽ More
We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matching the convergence rate in the subgeometric lower bound. These results provide insight into the optimal design of adaptation strategies and also limitations on the convergence behavior of adaptive Markov chain Monte Carlo. Applications to an adaptive unadjusted Langevin algorithm as well as adaptive Metropolis-Hastings with independent proposals and random-walk proposals are explored.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Exploring the generalizability of the optimal 0.234 acceptance rate in random-walk Metropolis and parallel tempering algorithms
Authors:
Aidan Li,
Liyan Wang,
Tianye Dou,
Jeffrey S. Rosenthal
Abstract:
For random-walk Metropolis (RWM) and parallel tempering (PT) algorithms, an asymptotic acceptance rate of around 0.234 is known to be optimal in certain high-dimensional limits. However, its practical relevance is uncertain due to restrictive derivation conditions. We synthesise previous theoretical advances in extending the 0.234 acceptance rate to more general settings, and demonstrate its appli…
▽ More
For random-walk Metropolis (RWM) and parallel tempering (PT) algorithms, an asymptotic acceptance rate of around 0.234 is known to be optimal in certain high-dimensional limits. However, its practical relevance is uncertain due to restrictive derivation conditions. We synthesise previous theoretical advances in extending the 0.234 acceptance rate to more general settings, and demonstrate its applicability with a comprehensive empirical simulation study on examples examining how acceptance rates affect Expected Squared Jumping Distance (ESJD). Our experiments show the optimality of the 0.234 acceptance rate for RWM is surprisingly robust even in lower dimensions across various proposal, multimodal distributions that may not have an i.i.d. product density, and curved Rosenbrock target distributions with nonlinear correlation structure. Parallel tempering experiments also show that the idealized 0.234 spacing of inverse temperatures may be approximately optimal for low dimensions and non i.i.d. product target densities, and that constructing an inverse temperature ladder with spacings given by a swap acceptance of 0.234 is a viable strategy.
△ Less
Submitted 11 June, 2025; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Comparing the Efficiency of General State Space Reversible MCMC Algorithms
Authors:
Geoffrey T. Salmon,
Jeffrey S. Rosenthal
Abstract:
We review and provide new proofs of results used to compare the efficiency of estimates generated by reversible MCMC algorithms on a general state space. We provide a full proof of the formula for the asymptotic variance for real-valued functionals on $φ$-irreducible reversible Markov chains, first introduced by Kipnis and Varadhan. Given two Markov kernels $P$ and $Q$ with stationary measure $π$,…
▽ More
We review and provide new proofs of results used to compare the efficiency of estimates generated by reversible MCMC algorithms on a general state space. We provide a full proof of the formula for the asymptotic variance for real-valued functionals on $φ$-irreducible reversible Markov chains, first introduced by Kipnis and Varadhan. Given two Markov kernels $P$ and $Q$ with stationary measure $π$, we say that the Markov kernel $P$ efficiency dominates the Markov kernel $Q$ if the asymptotic variance with respect to $P$ is at most the asymptotic variance with respect to $Q$ for every real-valued functional $f\in L^2(π)$. Assuming only a basic background in functional analysis, we prove that for two $φ$-irreducible reversible Markov kernels $P$ and $Q$, $P$ efficiency dominates $Q$ if and only if the operator $Q-P$, where $P$ is the operator on $L^2(π)$ that maps $f\mapsto\int f(y)P(\cdot,dy)$ and similarly for $Q$, is positive on $L^2(π)$, i.e. $\langle f,(Q-P)f\rangle\geq0$ for every $f\in L^2(π)$. We use this result to show that reversible antithetic kernels are more efficient than i.i.d. sampling, and that efficiency dominance is a partial ordering on $φ$-irreducible reversible Markov kernels. We also provide a proof based on that of Tierney that Peskun dominance is a sufficient condition for efficiency dominance for reversible kernels. Using these results, we show that Markov kernels formed by randomly selecting other "component" Markov kernels will always efficiency dominate another Markov kernel formed in this way, as long as the component kernels of the former efficiency dominate those of the latter. These results on the efficiency dominance of combining component kernels generalises the results on the efficiency dominance of combined chains introduced by Neal and Rosenthal from finite state spaces to general state spaces.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Weak convergence of adaptive Markov chain Monte Carlo
Authors:
Austin Brown,
Jeffrey S. Rosenthal
Abstract:
This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to…
▽ More
This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to general Wasserstein distances are established along with a weak law of large numbers for possibly unbounded Lipschitz functions. Applications are applied to auto-regressive processes in various settings, unadjusted Langevin processes, and adaptive Metropolis-Hastings.
△ Less
Submitted 23 December, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Estimating MCMC convergence rates using common random number simulation
Authors:
Sabrina Sixta,
Jeffrey S. Rosenthal,
Austin Brown
Abstract:
This paper presents how to use common random number (CRN) simulation to evaluate Markov chain Monte Carlo (MCMC) convergence to stationarity. We provide an upper bound on the Wasserstein distance of a Markov chain to its stationary distribution after $N$ steps in terms of averages over CRN simulations. We apply our bound to Gibbs samplers on a variance component model, a model related to James-Ste…
▽ More
This paper presents how to use common random number (CRN) simulation to evaluate Markov chain Monte Carlo (MCMC) convergence to stationarity. We provide an upper bound on the Wasserstein distance of a Markov chain to its stationary distribution after $N$ steps in terms of averages over CRN simulations. We apply our bound to Gibbs samplers on a variance component model, a model related to James-Stein estimators, and a Bayesian linear regression model. For the former two examples, we show that the CRN simulated bound converges to zero significantly more quickly compared to available drift and minorization bounds.
△ Less
Submitted 29 May, 2025; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Efficiency of reversible MCMC methods: elementary derivations and applications to composite methods
Authors:
Radford M. Neal,
Jeffrey S. Rosenthal
Abstract:
We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix $P$ efficiency-dominates one with transition matrix $Q$ if for…
▽ More
We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix $P$ efficiency-dominates one with transition matrix $Q$ if for every function of state it has lower (or equal) asymptotic variance. We give elementary proofs of some previous results regarding efficiency dominance, leading to a self-contained demonstration that a reversible chain with transition matrix $P$ efficiency-dominates a reversible chain with transition matrix $Q$ if and only if none of the eigenvalues of $Q-P$ are negative. This allows us to conclude that modifying a reversible MCMC method to improve its efficiency will also improve the efficiency of a method that randomly chooses either this or some other reversible method, and to conclude that improving the efficiency of a reversible update for one component of state (as in Gibbs sampling) will improve the overall efficiency of a reversible method that combines this and other updates. It also explains how antithetic MCMC can be more efficient than i.i.d. sampling. We also establish conditions that can guarantee that a method is not efficiency-dominated by any other method.
△ Less
Submitted 27 March, 2024; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Sampling via Rejection-Free Partial Neighbor Search
Authors:
Sigeng Chen,
Jeffrey S. Rosenthal,
Aki Dote,
Hirotaka Tamura,
Ali Sheikholeslami
Abstract:
The Metropolis algorithm involves producing a Markov chain to converge to a specified target density $π$. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by evaluating all neighbors. Rejection-Free can be made more efficient through the use of parallelism hardware. However, for some specialized hardw…
▽ More
The Metropolis algorithm involves producing a Markov chain to converge to a specified target density $π$. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by evaluating all neighbors. Rejection-Free can be made more efficient through the use of parallelism hardware. However, for some specialized hardware, such as Digital Annealing Unit, the number of units will limit the number of neighbors being considered at each step. Hence, we propose an enhanced version of Rejection-Free known as Partial Neighbor Search, which only considers a portion of the neighbors while using the Rejection-Free technique. This method will be tested on several examples to demonstrate its effectiveness and advantages under different circumstances.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Football Group Draw Probabilities and Corrections
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
This paper considers the challenge of designing football group draw mechanisms which have the uniform distribution over all valid draw assignments, but are also entertaining, practical, and transparent. We explain how to simulate the FIFA Sequential Draw method, to compute the non-uniformity of its draws by comparison to a uniform Rejection Sampler. We then propose two practical methods of achievi…
▽ More
This paper considers the challenge of designing football group draw mechanisms which have the uniform distribution over all valid draw assignments, but are also entertaining, practical, and transparent. We explain how to simulate the FIFA Sequential Draw method, to compute the non-uniformity of its draws by comparison to a uniform Rejection Sampler. We then propose two practical methods of achieving the uniform distribution while still using balls and bowls in a way which is suitable for a televised draw. The solutions can also be tried interactively.
△ Less
Submitted 25 January, 2023; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Optimization via Rejection-Free Partial Neighbor Search
Authors:
Sigeng Chen,
Jeffrey S. Rosenthal,
Aki Dote,
Hirotaka Tamura,
Ali Sheikholeslami
Abstract:
Simulated Annealing using Metropolis steps at decreasing temperatures is widely used to solve complex combinatorial optimization problems. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by considering all the neighbors at every step. As a solution to avoid the algorithm from becoming stuck in local…
▽ More
Simulated Annealing using Metropolis steps at decreasing temperatures is widely used to solve complex combinatorial optimization problems. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by considering all the neighbors at every step. As a solution to avoid the algorithm from becoming stuck in local extreme areas, we propose an enhanced version of Rejection-Free called Partial Neighbor Search (PNS), which only considers random parts of the neighbors while applying Rejection-Free. We demonstrate the superior performance of the Rejection-Free PNS algorithm by applying these methods to several examples, such as the QUBO question, the Knapsack problem, the 3R3XOR problem, and the quadratic programming.
△ Less
Submitted 7 October, 2022; v1 submitted 15 April, 2022;
originally announced May 2022.
-
Equivalences of Geometric Ergodicity of Markov Chains
Authors:
M. A. Gallegos-Herrada,
D. Ledvinka,
J. S. Rosenthal
Abstract:
This paper gathers together different conditions which are all equivalent to geometric ergodicity of time-homogeneous Markov chains on general state spaces. A total of 34 different conditions are presented (27 for general chains plus 7 just for reversible chains), some old and some new, in terms of such notions as convergence bounds, drift conditions, spectral properties, etc., with different assu…
▽ More
This paper gathers together different conditions which are all equivalent to geometric ergodicity of time-homogeneous Markov chains on general state spaces. A total of 34 different conditions are presented (27 for general chains plus 7 just for reversible chains), some old and some new, in terms of such notions as convergence bounds, drift conditions, spectral properties, etc., with different assumptions about the distance metric used, finiteness of function moments, initial distribution, uniformity of bounds, and more. Proofs of the connections between the different conditions are provided, mostly self-contained but using some results from the literature where appropriate.
△ Less
Submitted 3 July, 2023; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Optimal Strategies and Rules for the Game of Horse
Authors:
Daniel Rosenthal,
Jeffrey S. Rosenthal
Abstract:
We investigate the probability of scoring a point when playing the basketball shooting game called "Horse". We show that under the Traditional Rules, it is optimal to choose very easy shots. We propose alternative rules called Pops Rules, and show that they lead to more difficult optimal shots, and thus to a more interesting game.
We investigate the probability of scoring a point when playing the basketball shooting game called "Horse". We show that under the Traditional Rules, it is optimal to choose very easy shots. We propose alternative rules called Pops Rules, and show that they lead to more difficult optimal shots, and thus to a more interesting game.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
Convergence rate bounds for iterative random functions using one-shot coupling
Authors:
Sabrina Sixta,
Jeffrey S. Rosenthal
Abstract:
One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced by Roberts and Rosenthal and generalized by Madras and Sezer. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is…
▽ More
One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced by Roberts and Rosenthal and generalized by Madras and Sezer. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic (ARCH) process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem's conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds.
△ Less
Submitted 1 July, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Bayesian Inference of Globular Cluster Properties Using Distribution Functions
Authors:
Gwendolyn M. Eadie,
Jeremy J. Webb,
Jeffrey S. Rosenthal
Abstract:
We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model paramet…
▽ More
We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model parameters through repeated statistical simulation. We find that given unbiased star samples, we are able to reconstruct the cluster parameters used to generate the mock cluster and the cluster's cumulative mass and mean velocity squared profiles with good accuracy. We further explore how strongly biased sampling, which could be the result of observing constraints, may affect this approach. Our tests indicate that if we instead have biased samples, then our estimates can be off in certain ways that are dependent on cluster morphology. Overall, our findings motivate obtaining samples of stars that are as unbiased as possible. This may be achieved by combining information from multiple telescopes (e.g., Hubble and Gaia), but will require careful modeling of the measurement uncertainties through a hierarchical model, which we plan to pursue in future work.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Dimension-free Mixing for High-dimensional Bayesian Variable Selection
Authors:
Quan Zhou,
Jun Yang,
Dootika Vats,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledg…
▽ More
Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledge, this is the first high-dimensional result which rigorously shows that the mixing rate of informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation. Motivated by the theoretical analysis of our sampler, we further propose a new approach called "two-stage drift condition" to studying convergence rates of Markov chains on general state spaces, which can be useful for obtaining tight complexity bounds in high-dimensional settings. The practical advantages of our algorithm are illustrated by both simulation studies and real data analysis.
△ Less
Submitted 23 April, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Sampling by Divergence Minimization
Authors:
Ameer Dharamshi,
Vivian Ngo,
Jeffrey S. Rosenthal
Abstract:
We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal an…
▽ More
We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal and target distributions to encourage proposals that are distributed similarly to the regional geometry of the target. Unlike traditional adaptive MCMC, this procedure rapidly adapts to the geometry of the target's current position as it explores the surrounding space without the need for many preexisting samples. The divergence minimization algorithms are tested on target distributions with irregularly shaped modes and we provide results demonstrating the effectiveness of our methods.
△ Less
Submitted 6 May, 2022; v1 submitted 2 May, 2021;
originally announced May 2021.
-
Convergence Rates of Attractive-Repulsive MCMC Algorithms
Authors:
Yu Hang Jiang,
Tong Liu,
Zhiya Lou,
Jeffrey S. Rosenthal,
Shanshan Shangguan,
Fei Wang,
Zixuan Wu
Abstract:
We consider MCMC algorithms for certain particle systems which include both attractive and repulsive forces, making their convergence analysis challenging. We prove that a version of these algorithms on a bounded state space is uniformly ergodic with an explicit quantitative convergence rate. We also prove that a version on an unbounded state-space is still geometrically ergodic, and then use the…
▽ More
We consider MCMC algorithms for certain particle systems which include both attractive and repulsive forces, making their convergence analysis challenging. We prove that a version of these algorithms on a bounded state space is uniformly ergodic with an explicit quantitative convergence rate. We also prove that a version on an unbounded state-space is still geometrically ergodic, and then use the method of shift-coupling to obtain an explicit quantitative bound on its convergence rate.
△ Less
Submitted 1 September, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
MCMC Confidence Intervals and Biases
Authors:
Yu Hang Jiang,
Tong Liu,
Zhiya Lou,
Jeffrey S. Rosenthal,
Shanshan Shangguan,
Fei Wang,
Zixuan Wu
Abstract:
The recent paper "Simple confidence intervals for MCMC without CLTs" by J.S. Rosenthal, showed the derivation of a simple MCMC confidence interval using only Chebyshev's inequality, not CLT. That result required certain assumptions about how the estimator bias and variance grow with the number of iterations $n$. In particular, the bias is $o(1/\sqrt{n})$. This assumption seemed mild. It is general…
▽ More
The recent paper "Simple confidence intervals for MCMC without CLTs" by J.S. Rosenthal, showed the derivation of a simple MCMC confidence interval using only Chebyshev's inequality, not CLT. That result required certain assumptions about how the estimator bias and variance grow with the number of iterations $n$. In particular, the bias is $o(1/\sqrt{n})$. This assumption seemed mild. It is generally believed that the estimator bias will be $O(1/n)$ and hence $o(1/\sqrt{n})$. However, questions were raised by researchers about how to verify this assumption. Indeed, we show that this assumption might not always hold. In this paper, we seek to simplify and weaken the assumptions in the previously mentioned paper, to make MCMC confidence intervals without CLTs more widely applicable.
△ Less
Submitted 29 June, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Introducing a new high-resolution handwritten digits data set with writer characteristics
Authors:
Cédric Beaulac,
Jeffrey S. Rosenthal
Abstract:
The contributions in this article are two-fold. First, we introduce a new hand-written digit data set that we collected. It contains high-resolution images of hand-written The contributions in this article are two-fold. First, we introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics which…
▽ More
The contributions in this article are two-fold. First, we introduce a new hand-written digit data set that we collected. It contains high-resolution images of hand-written The contributions in this article are two-fold. First, we introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics which are not available in the well-known MNIST database. The multiple writer characteristics gathered are a novelty of our data set and create new research opportunities. The data set is publicly available online. Second, we analyse this new data set. We begin with simple supervised tasks. We assess the predictability of the writer characteristics gathered, the effect of using some of those characteristics as predictors in classification task and the effect of higher resolution images on classification accuracy. We also explore semi-supervised applications; we can leverage the high quantity of handwritten digits data sets already existing online to improve the accuracy of various classifications task with noticeable success. Finally, we also demonstrate the generative perspective offered by this new data set; we are able to generate images that mimics the writing style of specific writers. The data set has unique and distinct features and our analysis establishes benchmarks and showcases some of the new opportunities made possible with this new data set.
△ Less
Submitted 13 April, 2022; v1 submitted 4 November, 2020;
originally announced November 2020.
-
Skew Brownian Motion and Complexity of the ALPS Algorithm
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal,
Nicholas G. Tawn
Abstract:
Simulated tempering is a popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. The paper [24] introduced the Annealed Leap-Point Sampler (ALPS) to allow for rapid movement between modes. In this paper, we prove that, under appropriate assumptions, a suitably scaled version of the ALPS algorithm converges weakly to skew Brownian motion. Our results show…
▽ More
Simulated tempering is a popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. The paper [24] introduced the Annealed Leap-Point Sampler (ALPS) to allow for rapid movement between modes. In this paper, we prove that, under appropriate assumptions, a suitably scaled version of the ALPS algorithm converges weakly to skew Brownian motion. Our results show that under appropriate assumptions, the ALPS algorithm mixes in time O(d[log(d)]^2 ) or O(d), depending on which version is used.
△ Less
Submitted 12 May, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
The Coupling/Minorization/Drift Approach to Markov Chain Convergence Rates
Authors:
Yu Hang Jiang,
Tong Liu,
Zhiya Lou,
Jeffrey S. Rosenthal,
Shanshan Shangguan,
Fei Wang,
Zixuan Wu
Abstract:
This review paper provides an introduction of Markov chains and their convergence rates which is an important and interesting mathematical topic which also has important applications for very widely used Markov chain Monte Carlo (MCMC) algorithm. We first discuss eigenvalue analysis for Markov chains on finite state spaces. Then, using the coupling construction, we prove two quantitative bounds ba…
▽ More
This review paper provides an introduction of Markov chains and their convergence rates which is an important and interesting mathematical topic which also has important applications for very widely used Markov chain Monte Carlo (MCMC) algorithm. We first discuss eigenvalue analysis for Markov chains on finite state spaces. Then, using the coupling construction, we prove two quantitative bounds based on minorization condition and drift conditions, and provide descriptive and intuitive examples to showcase how these theorems can be implemented in practice. This paper is meant to provide a general overview of the subject and spark interest in new Markov chain research areas.
△ Less
Submitted 1 September, 2021; v1 submitted 24 August, 2020;
originally announced August 2020.
-
An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial: A report from the Children's Oncology Group
Authors:
Cédric Beaulac,
Jeffrey S. Rosenthal,
Qinglin Pei,
Debra Friedman,
Suzanne Wolden,
David Hodgson
Abstract:
In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over…
▽ More
In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over the Cox Proportional Hazard (CoxPH) model. We discuss the weaknesses of the CoxPH model we would like to improve upon and then we introduce multiple algorithms, from well-established ones to state-of-the-art models, that solve these issues. We then compare every model according to the concordance index and the brier score. Finally, we produce a series of recommendations, based on our experience, for practitioners that would like to benefit from the recent advances in artificial intelligence.
△ Less
Submitted 26 March, 2021; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Jump Markov Chains and Rejection-Free Metropolis Algorithms
Authors:
J. S. Rosenthal,
A. Dote,
K. Dabiri,
H. Tamura,
S. Chen,
A. Sheikholeslami
Abstract:
We consider versions of the Metropolis algorithm which avoid the inefficiency of rejections. We first illustrate that a natural Uniform Selection Algorithm might not converge to the correct distribution. We then analyse the use of Markov jump chains which avoid successive repetitions of the same state. After exploring the properties of jump chains, we show how they can exploit parallelism in compu…
▽ More
We consider versions of the Metropolis algorithm which avoid the inefficiency of rejections. We first illustrate that a natural Uniform Selection Algorithm might not converge to the correct distribution. We then analyse the use of Markov jump chains which avoid successive repetitions of the same state. After exploring the properties of jump chains, we show how they can exploit parallelism in computer hardware to produce more efficient samples. We apply our results to the Metropolis algorithm, to Parallel Tempering, to a Bayesian model, to a two-dimensional ferromagnetic 4 x 4 Ising model, and to a pseudo-marginal MCMC algorithm.
△ Less
Submitted 28 October, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Optimal Scaling of Random-Walk Metropolis Algorithms on General Target Distributions
Authors:
Jun Yang,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
One main limitation of the existing optimal scaling results for Metropolis--Hastings algorithms is that the assumptions on the target distribution are unrealistic. In this paper, we consider optimal scaling of random-walk Metropolis algorithms on general target distributions in high dimensions arising from practical MCMC models from Bayesian statistics. For optimal scaling by maximizing expected s…
▽ More
One main limitation of the existing optimal scaling results for Metropolis--Hastings algorithms is that the assumptions on the target distribution are unrealistic. In this paper, we consider optimal scaling of random-walk Metropolis algorithms on general target distributions in high dimensions arising from practical MCMC models from Bayesian statistics. For optimal scaling by maximizing expected squared jumping distance (ESJD), we show the asymptotically optimal acceptance rate $0.234$ can be obtained under general realistic sufficient conditions on the target distribution. The new sufficient conditions are easy to be verified and may hold for some general classes of MCMC models arising from Bayesian statistics applications, which substantially generalize the product i.i.d. condition required in most existing literature of optimal scaling. Furthermore, we show one-dimensional diffusion limits can be obtained under slightly stronger conditions, which still allow dependent coordinates of the target distribution. We also connect the new diffusion limit results to complexity bounds of Metropolis algorithms in high dimensions.
△ Less
Submitted 4 May, 2020; v1 submitted 27 April, 2019;
originally announced April 2019.
-
Simple Confidence Intervals for MCMC Without CLTs
Authors:
Jeffrey S. Rosenthal
Abstract:
This short note argues that 95% confidence intervals for MCMC estimates can be obtained even without establishing a CLT, by multiplying their widths by 2.3.
This short note argues that 95% confidence intervals for MCMC estimates can be obtained even without establishing a CLT, by multiplying their widths by 2.3.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
A Deep Latent-Variable Model Application to Select Treatment Intensity in Survival Analysis
Authors:
Cédric Beaulac,
Jeffrey S. Rosenthal,
David Hodgson
Abstract:
In the following short article we adapt a new and popular machine learning model for inference on medical data sets. Our method is based on the Variational AutoEncoder (VAE) framework that we adapt to survival analysis on small data sets with missing values. In our model, the true health status appears as a set of latent variables that affects the observed covariates and the survival chances. We s…
▽ More
In the following short article we adapt a new and popular machine learning model for inference on medical data sets. Our method is based on the Variational AutoEncoder (VAE) framework that we adapt to survival analysis on small data sets with missing values. In our model, the true health status appears as a set of latent variables that affects the observed covariates and the survival chances. We show that this flexible model allows insightful decision-making using a predicted distribution and outperforms a classic survival analysis model.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Weight-Preserving Simulated Tempering
Authors:
Nicholas G. Tawn,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
Simulated tempering is popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. One problem with simulated tempering for multimodal targets is that the weights of the various modes change for different inverse-temperature values, sometimes dramatically so. In this paper, we provide a fix to overcome this problem, by adjusting the mode weights to be preserv…
▽ More
Simulated tempering is popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. One problem with simulated tempering for multimodal targets is that the weights of the various modes change for different inverse-temperature values, sometimes dramatically so. In this paper, we provide a fix to overcome this problem, by adjusting the mode weights to be preserved (i.e., constant) over different inverse-temperature settings. We then apply simulated tempering algorithms to multimodal targets using our mode weight correction. We present simulations in which our weight-preserving algorithm mixes between modes much more successfully than traditional tempering algorithms. We also prove a diffusion limit for an version of our algorithm, which shows that under appropriate assumptions, our algorithm mixes in time O(d [log d]^2).
△ Less
Submitted 11 February, 2019; v1 submitted 14 August, 2018;
originally announced August 2018.
-
Bayesian Spatial Analysis of Hardwood Tree Counts in Forests via MCMC
Authors:
Reihaneh Entezari,
Patrick E. Brown,
Jeffrey S. Rosenthal
Abstract:
In this paper, we perform Bayesian Inference to analyze spatial tree count data from the Timiskaming and Abitibi River forests in Ontario, Canada. We consider a Bayesian Generalized Linear Geostatistical Model and implement a Markov Chain Monte Carlo algorithm to sample from its posterior distribution. How spatial predictions for new sites in the forests change as the amount of training data is re…
▽ More
In this paper, we perform Bayesian Inference to analyze spatial tree count data from the Timiskaming and Abitibi River forests in Ontario, Canada. We consider a Bayesian Generalized Linear Geostatistical Model and implement a Markov Chain Monte Carlo algorithm to sample from its posterior distribution. How spatial predictions for new sites in the forests change as the amount of training data is reduced is studied and compared with a Logistic Regression model without a spatial effect. Finally, we discuss a stratified sampling approach for selecting subsets of data that allows for potential better predictions.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
BEST : A decision tree algorithm that handles missing values
Authors:
Cédric Beaulac,
Jeffrey S. Rosenthal
Abstract:
The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in this paper we demonstrate how to utilize this algorithm to analyse data sets containing missing values. We tested our algorithm against simulated data sets with…
▽ More
The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in this paper we demonstrate how to utilize this algorithm to analyse data sets containing missing values. We tested our algorithm against simulated data sets with various missing data structures and a real data set. The results demonstrate that this new classification procedure efficiently handles missing values and produces results that are slightly more accurate and more interpretable than most common procedures without any imputations or pre-processing.
△ Less
Submitted 14 April, 2020; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Predicting University Students' Academic Success and Major using Random Forests
Authors:
Cédric Beaulac,
Jeffrey S. Rosenthal
Abstract:
In this article, a large data set containing every course taken by every undergraduate student in a major university in Canada over 10 years is analysed. Modern machine learning algorithms can use large data sets to build useful tools for the data provider, in this case, the university. In this article, two classifiers are constructed using random forests. To begin, the first two semesters of cour…
▽ More
In this article, a large data set containing every course taken by every undergraduate student in a major university in Canada over 10 years is analysed. Modern machine learning algorithms can use large data sets to build useful tools for the data provider, in this case, the university. In this article, two classifiers are constructed using random forests. To begin, the first two semesters of courses completed by a student are used to predict if they will obtain an undergraduate degree. Secondly, for the students that completed a program, their major is predicted using once again the first few courses they have registered to. A classification tree is an intuitive and powerful classifier and building a random forest of trees improves this classifier. Random forests also allow for reliable variable importance measurements. These measures explain what variables are useful to the classifiers and can be used to better understand what is statistically related to the students' situation. The results are two accurate classifiers and a variable importance analysis that provides useful information to university administrations.
△ Less
Submitted 12 January, 2019; v1 submitted 9 February, 2018;
originally announced February 2018.
-
Complexity Results for MCMC derived from Quantitative Bounds
Authors:
Jun Yang,
Jeffrey S. Rosenthal
Abstract:
This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional {settings}. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the "large sets", and are chosen to rule out some "bad" states which have poor…
▽ More
This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional {settings}. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the "large sets", and are chosen to rule out some "bad" states which have poor drift property when the dimension of the state space gets large. Using the "large sets" together with a "fitted family of drift functions", a quantitative bound can be obtained which can be translated into a tight complexity bound. As a demonstration, we analyze several Gibbs samplers and obtain complexity upper bounds for the mixing time. In particular, for one example of Gibbs sampler which is related to the James--Stein estimator, we show that the number of iterations required for the Gibbs sampler to converge is constant under certain conditions on the observed data and the initial state. It is our hope that this modified drift-and-minorization approach can be employed in many other specific examples to obtain complexity bounds for high-dimensional Markov chains.
△ Less
Submitted 10 May, 2022; v1 submitted 2 August, 2017;
originally announced August 2017.
-
Approximations of Geometrically Ergodic Reversible Markov Chains
Authors:
Jeffrey Negrea,
Jeffrey S. Rosenthal
Abstract:
A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or intractable. A limited set of quantitative tools exist to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary dist…
▽ More
A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or intractable. A limited set of quantitative tools exist to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary distribution we intend to sample, $L_2(π)$. Our results apply to approximations of reversible chains which are geometrically ergodic, as is typically the case for applications to Markov Chain Monte Carlo. The focus of our work is on determining whether the approximating kernel will preserve the geometric ergodicity of the exact chain, and whether the approximating stationary distribution will be close to the original stationary distribution. For reversible chains, our results extend the results of Johndrow et al. [18] from the uniformly ergodic case to the geometrically ergodic case, under some additional regularity conditions. We then apply our results to a number of approximate MCMC algorithms.
△ Less
Submitted 20 January, 2021; v1 submitted 23 February, 2017;
originally announced February 2017.
-
MEXIT: Maximal un-coupling times for stochastic processes
Authors:
P. A. Ernst,
W. S. Kendall,
G. O. Roberts,
J. S. Rosenthal
Abstract:
Classical coupling constructions arrange for copies of the \emph{same} Markov process started at two \emph{different} initial states to become equal as soon as possible. In this paper, we consider an alternative coupling framework in which one seeks to arrange for two \emph{different} Markov (or other stochastic) processes to remain equal for as long as possible, when started in the \emph{same} st…
▽ More
Classical coupling constructions arrange for copies of the \emph{same} Markov process started at two \emph{different} initial states to become equal as soon as possible. In this paper, we consider an alternative coupling framework in which one seeks to arrange for two \emph{different} Markov (or other stochastic) processes to remain equal for as long as possible, when started in the \emph{same} state. We refer to this "un-coupling" or "maximal agreement" construction as \emph{MEXIT}, standing for "maximal exit". After highlighting the importance of un-coupling arguments in a few key statistical and probabilistic settings, we develop an explicit \MEXIT construction for stochastic processes in discrete time with countable state-space. This construction is generalized to random processes on general state-space running in continuous time, and then exemplified by discussion of \MEXIT for Brownian motions with two different constant drifts.
△ Less
Submitted 30 December, 2018; v1 submitted 13 February, 2017;
originally announced February 2017.
-
Hitting Time and Convergence Rate Bounds for Symmetric Langevin Diffusions
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We provide quantitative bounds on the convergence to stationarity of real-valued Langevin diffusions with symmetric target densities.
We provide quantitative bounds on the convergence to stationarity of real-valued Langevin diffusions with symmetric target densities.
△ Less
Submitted 9 November, 2016;
originally announced November 2016.
-
Likelihood Inflating Sampling Algorithm
Authors:
Reihaneh Entezari,
Radu V. Craiu,
Jeffrey S. Rosenthal
Abstract:
Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational cos…
▽ More
Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational costs by randomly splitting the dataset into smaller subsets and running MCMC methods independently in parallel on each subset using different processors. Each processor will be used to run an MCMC chain that samples sub-posterior distributions which are defined using an "inflated" likelihood function. We develop a strategy for combining the draws from different sub-posteriors to study the full posterior of the Bayesian Additive Regression Trees (BART) model. The performance of the method is tested using both simulated and real data.
△ Less
Submitted 30 June, 2017; v1 submitted 6 May, 2016;
originally announced May 2016.
-
Adaptive Component-wise Multiple-Try Metropolis Sampling
Authors:
Jinyoung Yang,
Evgeny Levi,
Radu V. Craiu,
Jeffrey S. Rosenthal
Abstract:
One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for an…
▽ More
One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for another part of the state space. We consider a component-wise multiple-try Metropolis (CMTM) algorithm that can automatically choose from a set of candidate moves sampled from different distributions. The computational efficiency is increased using an adaptation rule for the CMTM algorithm that dynamically builds a better set of proposal distributions as the Markov chain runs. The ergodicity of the adaptive chain is demonstrated theoretically. The performance is studied via simulations and real data examples.
△ Less
Submitted 21 March, 2017; v1 submitted 10 March, 2016;
originally announced March 2016.
-
Complexity Bounds for MCMC via Diffusion Limits
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We connect known results about diffusion limits of Markov chain Monte Carlo (MCMC) algorithms to the Computer Science notion of algorithm complexity. Our main result states that any diffusion limit of a Markov process implies a corresponding complexity bound (in an appropriate metric). We then combine this result with previously-known MCMC diffusion limit results to prove that under appropriate as…
▽ More
We connect known results about diffusion limits of Markov chain Monte Carlo (MCMC) algorithms to the Computer Science notion of algorithm complexity. Our main result states that any diffusion limit of a Markov process implies a corresponding complexity bound (in an appropriate metric). We then combine this result with previously-known MCMC diffusion limit results to prove that under appropriate assumptions, the Random-Walk Metropolis (RWM) algorithm in $d$ dimensions takes $O(d)$ iterations to converge to stationarity, while the Metropolis-Adjusted Langevin Algorithm (MALA) takes $O(d^{1/3})$ iterations to converge to stationarity.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms
Authors:
Radu V. Craiu,
Lawrence Gray,
Krzysztof Łatuszyński,
Neal Madras,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov cha…
▽ More
We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov chain Monte Carlo algorithms.
△ Less
Submitted 5 November, 2015; v1 submitted 16 March, 2014;
originally announced March 2014.
-
Minimising MCMC variance via diffusion limits, with an application to simulated tempering
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We derive new results comparing the asymptotic variance of diffusions by writing them as appropriate limits of discrete-time birth-death chains which themselves satisfy Peskun orderings. We then apply our results to simulated tempering algorithms to establish which choice of inverse temperatures minimises the asymptotic variance of all functionals and thus leads to the most efficient MCMC algorith…
▽ More
We derive new results comparing the asymptotic variance of diffusions by writing them as appropriate limits of discrete-time birth-death chains which themselves satisfy Peskun orderings. We then apply our results to simulated tempering algorithms to establish which choice of inverse temperatures minimises the asymptotic variance of all functionals and thus leads to the most efficient MCMC algorithm.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
On the efficiency of pseudo-marginal random walk Metropolis algorithms
Authors:
Chris Sherlock,
Alexandre H. Thiery,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approac…
▽ More
We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approaches infinity, under the assumption that the noise in the estimate of the log-target is additive and is independent of the position. For targets with independent and identically distributed components, we also obtain a limiting diffusion for the first component. We then consider the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we prove that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. We also find that the optimal scaling is insensitive to the noise and that the optimal variance of the noise is insensitive to the scaling. The theory is illustrated with a simulation study using the particle marginal random walk Metropolis.
△ Less
Submitted 30 December, 2014; v1 submitted 27 September, 2013;
originally announced September 2013.
-
The Containment Condition and AdapFail algorithms
Authors:
Krzysztof Latuszynski,
Jeffrey S. Rosenthal
Abstract:
This short note investigates convergence of adaptive MCMC algorithms, i.e.\ algorithms which modify the Markov chain update probabilities on the fly. We focus on the Containment condition introduced in \cite{roberts2007coupling}. We show that if the Containment condition is \emph{not} satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algo…
▽ More
This short note investigates convergence of adaptive MCMC algorithms, i.e.\ algorithms which modify the Markov chain update probabilities on the fly. We focus on the Containment condition introduced in \cite{roberts2007coupling}. We show that if the Containment condition is \emph{not} satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algorithm will be asymptotically less efficient then \emph{any} nonadaptive ergodic MCMC algorithm. We call such algorithms \texttt{AdapFail}, and conclude that they should not be used.
△ Less
Submitted 28 December, 2013; v1 submitted 6 July, 2013;
originally announced July 2013.
-
Convergence rate of Markov chain methods for genomic motif discovery
Authors:
Dawn B. Woodard,
Jeffrey S. Rosenthal
Abstract:
We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of…
▽ More
We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Specifically, we show that this occurs whenever there is more than one true repeating pattern in the data. In practice there are typically multiple such patterns in biological data, the goal being to detect the most well-conserved and frequently-occurring of these. Our findings match empirical results, in which the motif-discovery Gibbs sampler has exhibited such poor convergence that it is used only for finding modes of the posterior distribution (candidate motifs) rather than for obtaining samples from that distribution. Ours are some of the first meaningful bounds on the convergence rate of a Markov chain method for sampling from a multimodal posterior distribution, as a function of statistical quantities like the number of observations.
△ Less
Submitted 12 March, 2013;
originally announced March 2013.
-
Detecting multiple authorship of United States Supreme Court legal decisions using function words
Authors:
Jeffrey S. Rosenthal,
Albert H. Yoon
Abstract:
This paper uses statistical analysis of function words used in legal judgments written by United States Supreme Court justices, to determine which justices have the most variable writing style (which may indicated greater reliance on their law clerks when writing opinions), and also the extent to which different justices' writing styles are distinguishable from each other.
This paper uses statistical analysis of function words used in legal judgments written by United States Supreme Court justices, to determine which justices have the most variable writing style (which may indicated greater reliance on their law clerks when writing opinions), and also the extent to which different justices' writing styles are distinguishable from each other.
△ Less
Submitted 15 April, 2011;
originally announced April 2011.
-
Adaptive Gibbs samplers and related MCMC methods
Authors:
Krzysztof Łatuszyński,
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We consider various versions of adaptive Gibbs and Metropolis-within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run by learning as they go in an attempt to optimize the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positi…
▽ More
We consider various versions of adaptive Gibbs and Metropolis-within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run by learning as they go in an attempt to optimize the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions.
△ Less
Submitted 27 February, 2013; v1 submitted 30 January, 2011;
originally announced January 2011.
-
Adaptive Gibbs samplers
Authors:
Krzysztof Latuszynski,
Jeffrey S. Rosenthal
Abstract:
We consider various versions of adaptive Gibbs and Metropolis within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run, by learning as they go in an attempt to optimise the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various pos…
▽ More
We consider various versions of adaptive Gibbs and Metropolis within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run, by learning as they go in an attempt to optimise the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions.
△ Less
Submitted 15 January, 2010;
originally announced January 2010.
-
Variance bounding Markov chains
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
We introduce a new property of Markov chains, called variance bounding. We prove that, for reversible chains at least, variance bounding is weaker than, but closely related to, geometric ergodicity. Furthermore, variance bounding is equivalent to the existence of usual central limit theorems for all $L^2$ functionals. Also, variance bounding (unlike geometric ergodicity) is preserved under the P…
▽ More
We introduce a new property of Markov chains, called variance bounding. We prove that, for reversible chains at least, variance bounding is weaker than, but closely related to, geometric ergodicity. Furthermore, variance bounding is equivalent to the existence of usual central limit theorems for all $L^2$ functionals. Also, variance bounding (unlike geometric ergodicity) is preserved under the Peskun order. We close with some applications to Metropolis--Hastings algorithms.
△ Less
Submitted 17 June, 2008;
originally announced June 2008.
-
Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
A $φ$-irreducible and aperiodic Markov chain with stationary probability distribution will converge to its stationary distribution from almost all starting points. The property of Harris recurrence allows us to replace ``almost all'' by ``all,'' which is potentially important when running Markov chain Monte Carlo algorithms. Full-dimensional Metropolis--Hastings algorithms are known to be Harris…
▽ More
A $φ$-irreducible and aperiodic Markov chain with stationary probability distribution will converge to its stationary distribution from almost all starting points. The property of Harris recurrence allows us to replace ``almost all'' by ``all,'' which is potentially important when running Markov chain Monte Carlo algorithms. Full-dimensional Metropolis--Hastings algorithms are known to be Harris recurrent. In this paper, we consider conditions under which Metropolis-within-Gibbs and trans-dimensional Markov chains are or are not Harris recurrent. We present a simple but natural two-dimensional counter-example showing how Harris recurrence can fail, and also a variety of positive results which guarantee Harris recurrence. We also present some open problems. We close with a discussion of the practical implications for MCMC algorithms.
△ Less
Submitted 14 February, 2007;
originally announced February 2007.
-
Quantitative bounds on convergence of time-inhomogeneous Markov chains
Authors:
R. Douc,
E. Moulines,
Jeffrey S. Rosenthal
Abstract:
Convergence rates of Markov chains have been widely studied in recent years. In particular, quantitative bounds on convergence rates have been studied in various forms by Meyn and Tweedie [Ann. Appl. Probab. 4 (1994) 981-1101], Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566], Roberts and Tweedie [Stochastic Process. Appl. 80 (1999) 211-229], Jones and Hobert [Statist. Sci. 16 (2001) 312-3…
▽ More
Convergence rates of Markov chains have been widely studied in recent years. In particular, quantitative bounds on convergence rates have been studied in various forms by Meyn and Tweedie [Ann. Appl. Probab. 4 (1994) 981-1101], Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566], Roberts and Tweedie [Stochastic Process. Appl. 80 (1999) 211-229], Jones and Hobert [Statist. Sci. 16 (2001) 312-334] and Fort [Ph.D. thesis (2001) Univ. Paris VI]. In this paper, we extend a result of Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566] that concerns quantitative convergence rates for time-homogeneous Markov chains. Our extension allows us to consider f-total variation distance (instead of total variation) and time-inhomogeneous Markov chains. We apply our results to simulated annealing.
△ Less
Submitted 24 March, 2005;
originally announced March 2005.
-
Moment conditions for a sequence with negative drift to be uniformly bounded in L^r
Authors:
Robin Pemantle,
Jeffrey S. Rosenthal
Abstract:
Suppose a sequence of random variables {X_n} has negative drift when above a certain threshold and has increments bounded in L^p. When p>2 this implies that EX_n is bounded above by a constant independent of n and the particular sequence {X_n}. When p=<2 there are counterexamples showing this does not hold. In general, increments bounded in L^p lead to a uniform L^r bound on X_n^+ for any r<p-1,…
▽ More
Suppose a sequence of random variables {X_n} has negative drift when above a certain threshold and has increments bounded in L^p. When p>2 this implies that EX_n is bounded above by a constant independent of n and the particular sequence {X_n}. When p=<2 there are counterexamples showing this does not hold. In general, increments bounded in L^p lead to a uniform L^r bound on X_n^+ for any r<p-1, but not for r>=p-1. These results are motivated by questions about stability of queueing networks.
△ Less
Submitted 5 April, 2004;
originally announced April 2004.
-
General state space Markov chains and MCMC algorithms
Authors:
Gareth O. Roberts,
Jeffrey S. Rosenthal
Abstract:
This paper surveys various results about Markov chains on general (non-countable) state spaces. It begins with an introduction to Markov chain Monte Carlo (MCMC) algorithms, which provide the motivation and context for the theory which follows. Then, sufficient conditions for geometric and uniform ergodicity are presented, along with quantitative bounds on the rate of convergence to stationarity…
▽ More
This paper surveys various results about Markov chains on general (non-countable) state spaces. It begins with an introduction to Markov chain Monte Carlo (MCMC) algorithms, which provide the motivation and context for the theory which follows. Then, sufficient conditions for geometric and uniform ergodicity are presented, along with quantitative bounds on the rate of convergence to stationarity. Many of these results are proved using direct coupling constructions based on minorisation and drift conditions. Necessary and sufficient conditions for Central Limit Theorems (CLTs) are also presented, in some cases proved via the Poisson Equation or direct regeneration constructions. Finally, optimal scaling and weak convergence results for Metropolis-Hastings algorithms are discussed. None of the results presented is new, though many of the proofs are. We also describe some Open Problems.
△ Less
Submitted 11 April, 2007; v1 submitted 2 April, 2004;
originally announced April 2004.