Search | arXiv e-print repository

Quantifying the Speed-Up from Non-Reversibility in MCMC Tempering Algorithms

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We investigate the increase in efficiency of simulated and parallel tempering MCMC algorithms when using non-reversible updates to give them "momentum". By making a connection to a certain simple discrete Markov chain, we show that, under appropriate assumptions, the non-reversible algorithms still exhibit diffusive behaviour, just on a different time scale. We use this to argue that the optimally… ▽ More We investigate the increase in efficiency of simulated and parallel tempering MCMC algorithms when using non-reversible updates to give them "momentum". By making a connection to a certain simple discrete Markov chain, we show that, under appropriate assumptions, the non-reversible algorithms still exhibit diffusive behaviour, just on a different time scale. We use this to argue that the optimally scaled versions of the non-reversible algorithms are indeed more efficient than the optimally scaled versions of their traditional reversible counterparts, but only by a modest speed-up factor of about 42%. △ Less

Submitted 27 January, 2025; originally announced January 2025.

arXiv:2411.17084 [pdf, other]

Upper and lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo

Authors: Austin Brown, Jeffrey S. Rosenthal

Abstract: We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matc… ▽ More We investigate lower bounds on the subgeometric convergence of adaptive Markov chain Monte Carlo under any adaptation strategy. In particular, we prove general lower bounds in total variation and on the weak convergence rate under general adaptation plans. If the adaptation diminishes sufficiently fast, we also develop comparable convergence rate upper bounds that are capable of approximately matching the convergence rate in the subgeometric lower bound. These results provide insight into the optimal design of adaptation strategies and also limitations on the convergence behavior of adaptive Markov chain Monte Carlo. Applications to an adaptive unadjusted Langevin algorithm as well as adaptive Metropolis-Hastings with independent proposals and random-walk proposals are explored. △ Less

Submitted 25 November, 2024; originally announced November 2024.

MSC Class: 60J05; 60J22; 60G07

arXiv:2408.06894 [pdf, ps, other]

Exploring the generalizability of the optimal 0.234 acceptance rate in random-walk Metropolis and parallel tempering algorithms

Authors: Aidan Li, Liyan Wang, Tianye Dou, Jeffrey S. Rosenthal

Abstract: For random-walk Metropolis (RWM) and parallel tempering (PT) algorithms, an asymptotic acceptance rate of around 0.234 is known to be optimal in certain high-dimensional limits. However, its practical relevance is uncertain due to restrictive derivation conditions. We synthesise previous theoretical advances in extending the 0.234 acceptance rate to more general settings, and demonstrate its appli… ▽ More For random-walk Metropolis (RWM) and parallel tempering (PT) algorithms, an asymptotic acceptance rate of around 0.234 is known to be optimal in certain high-dimensional limits. However, its practical relevance is uncertain due to restrictive derivation conditions. We synthesise previous theoretical advances in extending the 0.234 acceptance rate to more general settings, and demonstrate its applicability with a comprehensive empirical simulation study on examples examining how acceptance rates affect Expected Squared Jumping Distance (ESJD). Our experiments show the optimality of the 0.234 acceptance rate for RWM is surprisingly robust even in lower dimensions across various proposal, multimodal distributions that may not have an i.i.d. product density, and curved Rosenbrock target distributions with nonlinear correlation structure. Parallel tempering experiments also show that the idealized 0.234 spacing of inverse temperatures may be approximately optimal for low dimensions and non i.i.d. product target densities, and that constructing an inverse temperature ladder with spacings given by a swap acceptance of 0.234 is a viable strategy. △ Less

Submitted 11 June, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

Comments: Under review at Communications in Statistics - Simulation and Computation

arXiv:2408.04155 [pdf, ps, other]

Comparing the Efficiency of General State Space Reversible MCMC Algorithms

Authors: Geoffrey T. Salmon, Jeffrey S. Rosenthal

Abstract: We review and provide new proofs of results used to compare the efficiency of estimates generated by reversible MCMC algorithms on a general state space. We provide a full proof of the formula for the asymptotic variance for real-valued functionals on $φ$-irreducible reversible Markov chains, first introduced by Kipnis and Varadhan. Given two Markov kernels $P$ and $Q$ with stationary measure $π$,… ▽ More We review and provide new proofs of results used to compare the efficiency of estimates generated by reversible MCMC algorithms on a general state space. We provide a full proof of the formula for the asymptotic variance for real-valued functionals on $φ$-irreducible reversible Markov chains, first introduced by Kipnis and Varadhan. Given two Markov kernels $P$ and $Q$ with stationary measure $π$, we say that the Markov kernel $P$ efficiency dominates the Markov kernel $Q$ if the asymptotic variance with respect to $P$ is at most the asymptotic variance with respect to $Q$ for every real-valued functional $f\in L^2(π)$. Assuming only a basic background in functional analysis, we prove that for two $φ$-irreducible reversible Markov kernels $P$ and $Q$, $P$ efficiency dominates $Q$ if and only if the operator $Q-P$, where $P$ is the operator on $L^2(π)$ that maps $f\mapsto\int f(y)P(\cdot,dy)$ and similarly for $Q$, is positive on $L^2(π)$, i.e. $\langle f,(Q-P)f\rangle\geq0$ for every $f\in L^2(π)$. We use this result to show that reversible antithetic kernels are more efficient than i.i.d. sampling, and that efficiency dominance is a partial ordering on $φ$-irreducible reversible Markov kernels. We also provide a proof based on that of Tierney that Peskun dominance is a sufficient condition for efficiency dominance for reversible kernels. Using these results, we show that Markov kernels formed by randomly selecting other "component" Markov kernels will always efficiency dominate another Markov kernel formed in this way, as long as the component kernels of the former efficiency dominate those of the latter. These results on the efficiency dominance of combining component kernels generalises the results on the efficiency dominance of combined chains introduced by Neal and Rosenthal from finite state spaces to general state spaces. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 27 pages

MSC Class: 60J05

arXiv:2406.00820 [pdf, other]

doi 10.1017/jpr.2025.4

Weak convergence of adaptive Markov chain Monte Carlo

Authors: Austin Brown, Jeffrey S. Rosenthal

Abstract: This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to… ▽ More This article develops general conditions for weak convergence of adaptive Markov chain Monte Carlo processes and is shown to imply a weak law of large numbers for bounded Lipschitz continuous functions. This allows an estimation theory for adaptive Markov chain Monte Carlo where previously developed theory in total variation may fail or be difficult to establish. Extensions of weak convergence to general Wasserstein distances are established along with a weak law of large numbers for possibly unbounded Lipschitz functions. Applications are applied to auto-regressive processes in various settings, unadjusted Langevin processes, and adaptive Metropolis-Hastings. △ Less

Submitted 23 December, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: Fixed a typo in Assumption 4 and added improvements to Section 8

MSC Class: 60J05; 60J22;

arXiv:2309.15735 [pdf, ps, other]

Estimating MCMC convergence rates using common random number simulation

Authors: Sabrina Sixta, Jeffrey S. Rosenthal, Austin Brown

Abstract: This paper presents how to use common random number (CRN) simulation to evaluate Markov chain Monte Carlo (MCMC) convergence to stationarity. We provide an upper bound on the Wasserstein distance of a Markov chain to its stationary distribution after $N$ steps in terms of averages over CRN simulations. We apply our bound to Gibbs samplers on a variance component model, a model related to James-Ste… ▽ More This paper presents how to use common random number (CRN) simulation to evaluate Markov chain Monte Carlo (MCMC) convergence to stationarity. We provide an upper bound on the Wasserstein distance of a Markov chain to its stationary distribution after $N$ steps in terms of averages over CRN simulations. We apply our bound to Gibbs samplers on a variance component model, a model related to James-Stein estimators, and a Bayesian linear regression model. For the former two examples, we show that the CRN simulated bound converges to zero significantly more quickly compared to available drift and minorization bounds. △ Less

Submitted 29 May, 2025; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2305.18268 [pdf, ps, other]

doi 10.1017/jpr.2024.48

Efficiency of reversible MCMC methods: elementary derivations and applications to composite methods

Authors: Radford M. Neal, Jeffrey S. Rosenthal

Abstract: We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix $P$ efficiency-dominates one with transition matrix $Q$ if for… ▽ More We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix $P$ efficiency-dominates one with transition matrix $Q$ if for every function of state it has lower (or equal) asymptotic variance. We give elementary proofs of some previous results regarding efficiency dominance, leading to a self-contained demonstration that a reversible chain with transition matrix $P$ efficiency-dominates a reversible chain with transition matrix $Q$ if and only if none of the eigenvalues of $Q-P$ are negative. This allows us to conclude that modifying a reversible MCMC method to improve its efficiency will also improve the efficiency of a method that randomly chooses either this or some other reversible method, and to conclude that improving the efficiency of a reversible update for one component of state (as in Gibbs sampling) will improve the overall efficiency of a reversible method that combines this and other updates. It also explains how antithetic MCMC can be more efficient than i.i.d. sampling. We also establish conditions that can guarantee that a method is not efficiency-dominated by any other method. △ Less

Submitted 27 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 24 pages

Journal ref: J. Appl. Probab. 62 (2025) 188-208

arXiv:2210.10513 [pdf, other]

Sampling via Rejection-Free Partial Neighbor Search

Authors: Sigeng Chen, Jeffrey S. Rosenthal, Aki Dote, Hirotaka Tamura, Ali Sheikholeslami

Abstract: The Metropolis algorithm involves producing a Markov chain to converge to a specified target density $π$. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by evaluating all neighbors. Rejection-Free can be made more efficient through the use of parallelism hardware. However, for some specialized hardw… ▽ More The Metropolis algorithm involves producing a Markov chain to converge to a specified target density $π$. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by evaluating all neighbors. Rejection-Free can be made more efficient through the use of parallelism hardware. However, for some specialized hardware, such as Digital Annealing Unit, the number of units will limit the number of neighbors being considered at each step. Hence, we propose an enhanced version of Rejection-Free known as Partial Neighbor Search, which only considers a portion of the neighbors while using the Rejection-Free technique. This method will be tested on several examples to demonstrate its effectiveness and advantages under different circumstances. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 34 pages and 11 figures

arXiv:2205.06578 [pdf, ps, other]

Football Group Draw Probabilities and Corrections

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: This paper considers the challenge of designing football group draw mechanisms which have the uniform distribution over all valid draw assignments, but are also entertaining, practical, and transparent. We explain how to simulate the FIFA Sequential Draw method, to compute the non-uniformity of its draws by comparison to a uniform Rejection Sampler. We then propose two practical methods of achievi… ▽ More This paper considers the challenge of designing football group draw mechanisms which have the uniform distribution over all valid draw assignments, but are also entertaining, practical, and transparent. We explain how to simulate the FIFA Sequential Draw method, to compute the non-uniformity of its draws by comparison to a uniform Rejection Sampler. We then propose two practical methods of achieving the uniform distribution while still using balls and bowls in a way which is suitable for a televised draw. The solutions can also be tried interactively. △ Less

Submitted 25 January, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: 33 pages

arXiv:2205.02083 [pdf, other]

Optimization via Rejection-Free Partial Neighbor Search

Authors: Sigeng Chen, Jeffrey S. Rosenthal, Aki Dote, Hirotaka Tamura, Ali Sheikholeslami

Abstract: Simulated Annealing using Metropolis steps at decreasing temperatures is widely used to solve complex combinatorial optimization problems. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by considering all the neighbors at every step. As a solution to avoid the algorithm from becoming stuck in local… ▽ More Simulated Annealing using Metropolis steps at decreasing temperatures is widely used to solve complex combinatorial optimization problems. In order to improve its efficiency, we can use the Rejection-Free version of the Metropolis algorithm, which avoids the inefficiency of rejections by considering all the neighbors at every step. As a solution to avoid the algorithm from becoming stuck in local extreme areas, we propose an enhanced version of Rejection-Free called Partial Neighbor Search (PNS), which only considers random parts of the neighbors while applying Rejection-Free. We demonstrate the superior performance of the Rejection-Free PNS algorithm by applying these methods to several examples, such as the QUBO question, the Knapsack problem, the 3R3XOR problem, and the quadratic programming. △ Less

Submitted 7 October, 2022; v1 submitted 15 April, 2022; originally announced May 2022.

Comments: 24 pages with 2 more pages of reference, 9 figures

arXiv:2203.04395 [pdf, ps, other]

Equivalences of Geometric Ergodicity of Markov Chains

Authors: M. A. Gallegos-Herrada, D. Ledvinka, J. S. Rosenthal

Abstract: This paper gathers together different conditions which are all equivalent to geometric ergodicity of time-homogeneous Markov chains on general state spaces. A total of 34 different conditions are presented (27 for general chains plus 7 just for reversible chains), some old and some new, in terms of such notions as convergence bounds, drift conditions, spectral properties, etc., with different assu… ▽ More This paper gathers together different conditions which are all equivalent to geometric ergodicity of time-homogeneous Markov chains on general state spaces. A total of 34 different conditions are presented (27 for general chains plus 7 just for reversible chains), some old and some new, in terms of such notions as convergence bounds, drift conditions, spectral properties, etc., with different assumptions about the distance metric used, finiteness of function moments, initial distribution, uniformity of bounds, and more. Proofs of the connections between the different conditions are provided, mostly self-contained but using some results from the literature where appropriate. △ Less

Submitted 3 July, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 30 pages. Two additional equivalences added after publication

arXiv:2201.06560 [pdf, other]

Optimal Strategies and Rules for the Game of Horse

Authors: Daniel Rosenthal, Jeffrey S. Rosenthal

Abstract: We investigate the probability of scoring a point when playing the basketball shooting game called "Horse". We show that under the Traditional Rules, it is optimal to choose very easy shots. We propose alternative rules called Pops Rules, and show that they lead to more difficult optimal shots, and thus to a more interesting game. We investigate the probability of scoring a point when playing the basketball shooting game called "Horse". We show that under the Traditional Rules, it is optimal to choose very easy shots. We propose alternative rules called Pops Rules, and show that they lead to more difficult optimal shots, and thus to a more interesting game. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: 10 pages; to appear in the Notices of the American Mathematical Society

arXiv:2112.03982 [pdf, other]

Convergence rate bounds for iterative random functions using one-shot coupling

Authors: Sabrina Sixta, Jeffrey S. Rosenthal

Abstract: One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced by Roberts and Rosenthal and generalized by Madras and Sezer. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is… ▽ More One-shot coupling is a method of bounding the convergence rate between two copies of a Markov chain in total variation distance, which was first introduced by Roberts and Rosenthal and generalized by Madras and Sezer. The method is divided into two parts: the contraction phase, when the chains converge in expected distance and the coalescing phase, which occurs at the last iteration, when there is an attempt to couple. One-shot coupling does not require the use of any exogenous variables like a drift function or a minorization constant. In this paper, we summarize the one-shot coupling method into the One-Shot Coupling Theorem. We then apply the theorem to two families of Markov chains: the random functional autoregressive process and the autoregressive conditional heteroscedastic (ARCH) process. We provide multiple examples of how the theorem can be used on various models including ones in high dimensions. These examples illustrate how the theorem's conditions can be verified in a straightforward way. The one-shot coupling method appears to generate tight geometric convergence rate bounds. △ Less

Submitted 1 July, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2108.13491 [pdf, other]

doi 10.3847/1538-4357/ac4494

Bayesian Inference of Globular Cluster Properties Using Distribution Functions

Authors: Gwendolyn M. Eadie, Jeremy J. Webb, Jeffrey S. Rosenthal

Abstract: We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model paramet… ▽ More We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model parameters through repeated statistical simulation. We find that given unbiased star samples, we are able to reconstruct the cluster parameters used to generate the mock cluster and the cluster's cumulative mass and mean velocity squared profiles with good accuracy. We further explore how strongly biased sampling, which could be the result of observing constraints, may affect this approach. Our tests indicate that if we instead have biased samples, then our estimates can be off in certain ways that are dependent on cluster morphology. Overall, our findings motivate obtaining samples of stars that are as unbiased as possible. This may be achieved by combining information from multiple telescopes (e.g., Hubble and Gaia), but will require careful modeling of the measurement uncertainties through a hierarchical model, which we plan to pursue in future work. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: submitted to ApJ; 21 pages, 11 figures

arXiv:2105.05719 [pdf, other]

Dimension-free Mixing for High-dimensional Bayesian Variable Selection

Authors: Quan Zhou, Jun Yang, Dootika Vats, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledg… ▽ More Yang et al. (2016) proved that the symmetric random walk Metropolis--Hastings algorithm for Bayesian variable selection is rapidly mixing under mild high-dimensional assumptions. We propose a novel MCMC sampler using an informed proposal scheme, which we prove achieves a much faster mixing time that is independent of the number of covariates, under the same assumptions. To the best of our knowledge, this is the first high-dimensional result which rigorously shows that the mixing rate of informed MCMC methods can be fast enough to offset the computational cost of local posterior evaluation. Motivated by the theoretical analysis of our sampler, we further propose a new approach called "two-stage drift condition" to studying convergence rates of Markov chains on general state spaces, which can be useful for obtaining tight complexity bounds in high-dimensional settings. The practical advantages of our algorithm are illustrated by both simulation studies and real data analysis. △ Less

Submitted 23 April, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

MSC Class: 62F15; 60J20

arXiv:2105.00520 [pdf, other]

doi 10.1080/03610918.2023.2199352

Sampling by Divergence Minimization

Authors: Ameer Dharamshi, Vivian Ngo, Jeffrey S. Rosenthal

Abstract: We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal an… ▽ More We introduce a Markov Chain Monte Carlo (MCMC) method that is designed to sample from target distributions with irregular geometry using an adaptive scheme. In cases where targets exhibit non-Gaussian behaviour, we propose that adaption should be regional rather than global. Our algorithm minimizes the information projection component of the Kullback-Leibler (KL) divergence between the proposal and target distributions to encourage proposals that are distributed similarly to the regional geometry of the target. Unlike traditional adaptive MCMC, this procedure rapidly adapts to the geometry of the target's current position as it explores the surrounding space without the need for many preexisting samples. The divergence minimization algorithms are tested on target distributions with irregularly shaped modes and we provide results demonstrating the effectiveness of our methods. △ Less

Submitted 6 May, 2022; v1 submitted 2 May, 2021; originally announced May 2021.

Comments: 33 pages, 12 figures

arXiv:2012.04786 [pdf, other]

Convergence Rates of Attractive-Repulsive MCMC Algorithms

Authors: Yu Hang Jiang, Tong Liu, Zhiya Lou, Jeffrey S. Rosenthal, Shanshan Shangguan, Fei Wang, Zixuan Wu

Abstract: We consider MCMC algorithms for certain particle systems which include both attractive and repulsive forces, making their convergence analysis challenging. We prove that a version of these algorithms on a bounded state space is uniformly ergodic with an explicit quantitative convergence rate. We also prove that a version on an unbounded state-space is still geometrically ergodic, and then use the… ▽ More We consider MCMC algorithms for certain particle systems which include both attractive and repulsive forces, making their convergence analysis challenging. We prove that a version of these algorithms on a bounded state space is uniformly ergodic with an explicit quantitative convergence rate. We also prove that a version on an unbounded state-space is still geometrically ergodic, and then use the method of shift-coupling to obtain an explicit quantitative bound on its convergence rate. △ Less

Submitted 1 September, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: 26 pages, 2 figures

MSC Class: 60J10(primary); 60J20; 60J22(secondary)

arXiv:2012.02816 [pdf, ps, other]

MCMC Confidence Intervals and Biases

Authors: Yu Hang Jiang, Tong Liu, Zhiya Lou, Jeffrey S. Rosenthal, Shanshan Shangguan, Fei Wang, Zixuan Wu

Abstract: The recent paper "Simple confidence intervals for MCMC without CLTs" by J.S. Rosenthal, showed the derivation of a simple MCMC confidence interval using only Chebyshev's inequality, not CLT. That result required certain assumptions about how the estimator bias and variance grow with the number of iterations $n$. In particular, the bias is $o(1/\sqrt{n})$. This assumption seemed mild. It is general… ▽ More The recent paper "Simple confidence intervals for MCMC without CLTs" by J.S. Rosenthal, showed the derivation of a simple MCMC confidence interval using only Chebyshev's inequality, not CLT. That result required certain assumptions about how the estimator bias and variance grow with the number of iterations $n$. In particular, the bias is $o(1/\sqrt{n})$. This assumption seemed mild. It is generally believed that the estimator bias will be $O(1/n)$ and hence $o(1/\sqrt{n})$. However, questions were raised by researchers about how to verify this assumption. Indeed, we show that this assumption might not always hold. In this paper, we seek to simplify and weaken the assumptions in the previously mentioned paper, to make MCMC confidence intervals without CLTs more widely applicable. △ Less

Submitted 29 June, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: 20 pages (not including references)

MSC Class: 60J10; 62E20

arXiv:2011.07946 [pdf, other]

doi 10.1007/s42979-022-01494-2

Introducing a new high-resolution handwritten digits data set with writer characteristics

Authors: Cédric Beaulac, Jeffrey S. Rosenthal

Abstract: The contributions in this article are two-fold. First, we introduce a new hand-written digit data set that we collected. It contains high-resolution images of hand-written The contributions in this article are two-fold. First, we introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics which… ▽ More The contributions in this article are two-fold. First, we introduce a new hand-written digit data set that we collected. It contains high-resolution images of hand-written The contributions in this article are two-fold. First, we introduce a new handwritten digit data set that we collected. It contains high-resolution images of handwritten digits together with various writer characteristics which are not available in the well-known MNIST database. The multiple writer characteristics gathered are a novelty of our data set and create new research opportunities. The data set is publicly available online. Second, we analyse this new data set. We begin with simple supervised tasks. We assess the predictability of the writer characteristics gathered, the effect of using some of those characteristics as predictors in classification task and the effect of higher resolution images on classification accuracy. We also explore semi-supervised applications; we can leverage the high quantity of handwritten digits data sets already existing online to improve the accuracy of various classifications task with noticeable success. Finally, we also demonstrate the generative perspective offered by this new data set; we are able to generate images that mimics the writing style of specific writers. The data set has unique and distinct features and our analysis establishes benchmarks and showcases some of the new opportunities made possible with this new data set. △ Less

Submitted 13 April, 2022; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Data set available here : https://drive.google.com/drive/folders/1f2o1kjXLvcxRgtmMMuDkA2PQ5Zato4Or?usp=sharing

Journal ref: SN COMPUT. SCI. 4, 66 (2023)

arXiv:2009.12424 [pdf, other]

Skew Brownian Motion and Complexity of the ALPS Algorithm

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal, Nicholas G. Tawn

Abstract: Simulated tempering is a popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. The paper [24] introduced the Annealed Leap-Point Sampler (ALPS) to allow for rapid movement between modes. In this paper, we prove that, under appropriate assumptions, a suitably scaled version of the ALPS algorithm converges weakly to skew Brownian motion. Our results show… ▽ More Simulated tempering is a popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. The paper [24] introduced the Annealed Leap-Point Sampler (ALPS) to allow for rapid movement between modes. In this paper, we prove that, under appropriate assumptions, a suitably scaled version of the ALPS algorithm converges weakly to skew Brownian motion. Our results show that under appropriate assumptions, the ALPS algorithm mixes in time O(d[log(d)]^2 ) or O(d), depending on which version is used. △ Less

Submitted 12 May, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

arXiv:2008.10675 [pdf, other]

doi 10.1090/noti2253

The Coupling/Minorization/Drift Approach to Markov Chain Convergence Rates

Authors: Yu Hang Jiang, Tong Liu, Zhiya Lou, Jeffrey S. Rosenthal, Shanshan Shangguan, Fei Wang, Zixuan Wu

Abstract: This review paper provides an introduction of Markov chains and their convergence rates which is an important and interesting mathematical topic which also has important applications for very widely used Markov chain Monte Carlo (MCMC) algorithm. We first discuss eigenvalue analysis for Markov chains on finite state spaces. Then, using the coupling construction, we prove two quantitative bounds ba… ▽ More This review paper provides an introduction of Markov chains and their convergence rates which is an important and interesting mathematical topic which also has important applications for very widely used Markov chain Monte Carlo (MCMC) algorithm. We first discuss eigenvalue analysis for Markov chains on finite state spaces. Then, using the coupling construction, we prove two quantitative bounds based on minorization condition and drift conditions, and provide descriptive and intuitive examples to showcase how these theorems can be implemented in practice. This paper is meant to provide a general overview of the subject and spark interest in new Markov chain research areas. △ Less

Submitted 1 September, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

Comments: 14 pages, 2 figures. For web appendix please see http://www.probability.ca/NoticesApp. This is the updated version of previous paper: Markov Chain Convergence Rates from Coupling Constructions

MSC Class: 60J10 (Primary) 60J05; 60J22 (Secondary)

arXiv:2001.05534 [pdf, other]

doi 10.1080/08839514.2020.1815151

An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial: A report from the Children's Oncology Group

Authors: Cédric Beaulac, Jeffrey S. Rosenthal, Qinglin Pei, Debra Friedman, Suzanne Wolden, David Hodgson

Abstract: In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over… ▽ More In this manuscript we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over the Cox Proportional Hazard (CoxPH) model. We discuss the weaknesses of the CoxPH model we would like to improve upon and then we introduce multiple algorithms, from well-established ones to state-of-the-art models, that solve these issues. We then compare every model according to the concordance index and the brier score. Finally, we produce a series of recommendations, based on our experience, for practitioners that would like to benefit from the recent advances in artificial intelligence. △ Less

Submitted 26 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

Journal ref: Applied Artificial Intelligence 2020

arXiv:1910.13316 [pdf, other]

doi 10.1007/s00180-021-01095-2

Jump Markov Chains and Rejection-Free Metropolis Algorithms

Authors: J. S. Rosenthal, A. Dote, K. Dabiri, H. Tamura, S. Chen, A. Sheikholeslami

Abstract: We consider versions of the Metropolis algorithm which avoid the inefficiency of rejections. We first illustrate that a natural Uniform Selection Algorithm might not converge to the correct distribution. We then analyse the use of Markov jump chains which avoid successive repetitions of the same state. After exploring the properties of jump chains, we show how they can exploit parallelism in compu… ▽ More We consider versions of the Metropolis algorithm which avoid the inefficiency of rejections. We first illustrate that a natural Uniform Selection Algorithm might not converge to the correct distribution. We then analyse the use of Markov jump chains which avoid successive repetitions of the same state. After exploring the properties of jump chains, we show how they can exploit parallelism in computer hardware to produce more efficient samples. We apply our results to the Metropolis algorithm, to Parallel Tempering, to a Bayesian model, to a two-dimensional ferromagnetic 4 x 4 Ising model, and to a pseudo-marginal MCMC algorithm. △ Less

Submitted 28 October, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: 25 pages, 10 figures, 3 tables

arXiv:1904.12157 [pdf, ps, other]

Optimal Scaling of Random-Walk Metropolis Algorithms on General Target Distributions

Authors: Jun Yang, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: One main limitation of the existing optimal scaling results for Metropolis--Hastings algorithms is that the assumptions on the target distribution are unrealistic. In this paper, we consider optimal scaling of random-walk Metropolis algorithms on general target distributions in high dimensions arising from practical MCMC models from Bayesian statistics. For optimal scaling by maximizing expected s… ▽ More One main limitation of the existing optimal scaling results for Metropolis--Hastings algorithms is that the assumptions on the target distribution are unrealistic. In this paper, we consider optimal scaling of random-walk Metropolis algorithms on general target distributions in high dimensions arising from practical MCMC models from Bayesian statistics. For optimal scaling by maximizing expected squared jumping distance (ESJD), we show the asymptotically optimal acceptance rate $0.234$ can be obtained under general realistic sufficient conditions on the target distribution. The new sufficient conditions are easy to be verified and may hold for some general classes of MCMC models arising from Bayesian statistics applications, which substantially generalize the product i.i.d. condition required in most existing literature of optimal scaling. Furthermore, we show one-dimensional diffusion limits can be obtained under slightly stronger conditions, which still allow dependent coordinates of the target distribution. We also connect the new diffusion limit results to complexity bounds of Metropolis algorithms in high dimensions. △ Less

Submitted 4 May, 2020; v1 submitted 27 April, 2019; originally announced April 2019.

Comments: 45 pages

arXiv:1812.00126 [pdf, ps, other]

Simple Confidence Intervals for MCMC Without CLTs

Authors: Jeffrey S. Rosenthal

Abstract: This short note argues that 95% confidence intervals for MCMC estimates can be obtained even without establishing a CLT, by multiplying their widths by 2.3. This short note argues that 95% confidence intervals for MCMC estimates can be obtained even without establishing a CLT, by multiplying their widths by 2.3. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: 4 pages

MSC Class: 60J05

arXiv:1811.12323 [pdf, other]

A Deep Latent-Variable Model Application to Select Treatment Intensity in Survival Analysis

Authors: Cédric Beaulac, Jeffrey S. Rosenthal, David Hodgson

Abstract: In the following short article we adapt a new and popular machine learning model for inference on medical data sets. Our method is based on the Variational AutoEncoder (VAE) framework that we adapt to survival analysis on small data sets with missing values. In our model, the true health status appears as a set of latent variables that affects the observed covariates and the survival chances. We s… ▽ More In the following short article we adapt a new and popular machine learning model for inference on medical data sets. Our method is based on the Variational AutoEncoder (VAE) framework that we adapt to survival analysis on small data sets with missing values. In our model, the true health status appears as a set of latent variables that affects the observed covariates and the survival chances. We show that this flexible model allows insightful decision-making using a predicted distribution and outperforms a classic survival analysis model. △ Less

Submitted 29 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/53

arXiv:1808.04782 [pdf, other]

Weight-Preserving Simulated Tempering

Authors: Nicholas G. Tawn, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: Simulated tempering is popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. One problem with simulated tempering for multimodal targets is that the weights of the various modes change for different inverse-temperature values, sometimes dramatically so. In this paper, we provide a fix to overcome this problem, by adjusting the mode weights to be preserv… ▽ More Simulated tempering is popular method of allowing MCMC algorithms to move between modes of a multimodal target density π. One problem with simulated tempering for multimodal targets is that the weights of the various modes change for different inverse-temperature values, sometimes dramatically so. In this paper, we provide a fix to overcome this problem, by adjusting the mode weights to be preserved (i.e., constant) over different inverse-temperature settings. We then apply simulated tempering algorithms to multimodal targets using our mode weight correction. We present simulations in which our weight-preserving algorithm mixes between modes much more successfully than traditional tempering algorithms. We also prove a diffusion limit for an version of our algorithm, which shows that under appropriate assumptions, our algorithm mixes in time O(d [log d]^2). △ Less

Submitted 11 February, 2019; v1 submitted 14 August, 2018; originally announced August 2018.

arXiv:1807.01239 [pdf, other]

Bayesian Spatial Analysis of Hardwood Tree Counts in Forests via MCMC

Authors: Reihaneh Entezari, Patrick E. Brown, Jeffrey S. Rosenthal

Abstract: In this paper, we perform Bayesian Inference to analyze spatial tree count data from the Timiskaming and Abitibi River forests in Ontario, Canada. We consider a Bayesian Generalized Linear Geostatistical Model and implement a Markov Chain Monte Carlo algorithm to sample from its posterior distribution. How spatial predictions for new sites in the forests change as the amount of training data is re… ▽ More In this paper, we perform Bayesian Inference to analyze spatial tree count data from the Timiskaming and Abitibi River forests in Ontario, Canada. We consider a Bayesian Generalized Linear Geostatistical Model and implement a Markov Chain Monte Carlo algorithm to sample from its posterior distribution. How spatial predictions for new sites in the forests change as the amount of training data is reduced is studied and compared with a Logistic Regression model without a spatial effect. Finally, we discuss a stratified sampling approach for selecting subsets of data that allows for potential better predictions. △ Less

Submitted 3 July, 2018; originally announced July 2018.

arXiv:1804.10168 [pdf, other]

doi 10.1007/s00180-020-00987-z

BEST : A decision tree algorithm that handles missing values

Authors: Cédric Beaulac, Jeffrey S. Rosenthal

Abstract: The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in this paper we demonstrate how to utilize this algorithm to analyse data sets containing missing values. We tested our algorithm against simulated data sets with… ▽ More The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in this paper we demonstrate how to utilize this algorithm to analyse data sets containing missing values. We tested our algorithm against simulated data sets with various missing data structures and a real data set. The results demonstrate that this new classification procedure efficiently handles missing values and produces results that are slightly more accurate and more interpretable than most common procedures without any imputations or pre-processing. △ Less

Submitted 14 April, 2020; v1 submitted 26 April, 2018; originally announced April 2018.

Comments: To appear in Computational Statistics

Journal ref: Computational Statistics 2020

arXiv:1802.03418 [pdf, other]

doi 10.1007/s11162-019-09546-y

Predicting University Students' Academic Success and Major using Random Forests

Authors: Cédric Beaulac, Jeffrey S. Rosenthal

Abstract: In this article, a large data set containing every course taken by every undergraduate student in a major university in Canada over 10 years is analysed. Modern machine learning algorithms can use large data sets to build useful tools for the data provider, in this case, the university. In this article, two classifiers are constructed using random forests. To begin, the first two semesters of cour… ▽ More In this article, a large data set containing every course taken by every undergraduate student in a major university in Canada over 10 years is analysed. Modern machine learning algorithms can use large data sets to build useful tools for the data provider, in this case, the university. In this article, two classifiers are constructed using random forests. To begin, the first two semesters of courses completed by a student are used to predict if they will obtain an undergraduate degree. Secondly, for the students that completed a program, their major is predicted using once again the first few courses they have registered to. A classification tree is an intuitive and powerful classifier and building a random forest of trees improves this classifier. Random forests also allow for reliable variable importance measurements. These measures explain what variables are useful to the classifiers and can be used to better understand what is statistically related to the students' situation. The results are two accurate classifiers and a variable importance analysis that provides useful information to university administrations. △ Less

Submitted 12 January, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

Journal ref: Research in Higher Education 2019

arXiv:1708.00829 [pdf, ps, other]

Complexity Results for MCMC derived from Quantitative Bounds

Authors: Jun Yang, Jeffrey S. Rosenthal

Abstract: This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional {settings}. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the "large sets", and are chosen to rule out some "bad" states which have poor… ▽ More This paper considers how to obtain MCMC quantitative convergence bounds which can be translated into tight complexity bounds in high-dimensional {settings}. We propose a modified drift-and-minorization approach, which establishes generalized drift conditions defined in subsets of the state space. The subsets are called the "large sets", and are chosen to rule out some "bad" states which have poor drift property when the dimension of the state space gets large. Using the "large sets" together with a "fitted family of drift functions", a quantitative bound can be obtained which can be translated into a tight complexity bound. As a demonstration, we analyze several Gibbs samplers and obtain complexity upper bounds for the mixing time. In particular, for one example of Gibbs sampler which is related to the James--Stein estimator, we show that the number of iterations required for the Gibbs sampler to converge is constant under certain conditions on the observed data and the initial state. It is our hope that this modified drift-and-minorization approach can be employed in many other specific examples to obtain complexity bounds for high-dimensional Markov chains. △ Less

Submitted 10 May, 2022; v1 submitted 2 August, 2017; originally announced August 2017.

Comments: to appear in Annals of Applied Probability

arXiv:1702.07441 [pdf, ps, other]

doi 10.1017/apr.2021.10

Approximations of Geometrically Ergodic Reversible Markov Chains

Authors: Jeffrey Negrea, Jeffrey S. Rosenthal

Abstract: A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or intractable. A limited set of quantitative tools exist to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary dist… ▽ More A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or intractable. A limited set of quantitative tools exist to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary distribution we intend to sample, $L_2(π)$. Our results apply to approximations of reversible chains which are geometrically ergodic, as is typically the case for applications to Markov Chain Monte Carlo. The focus of our work is on determining whether the approximating kernel will preserve the geometric ergodicity of the exact chain, and whether the approximating stationary distribution will be close to the original stationary distribution. For reversible chains, our results extend the results of Johndrow et al. [18] from the uniformly ergodic case to the geometrically ergodic case, under some additional regularity conditions. We then apply our results to a number of approximate MCMC algorithms. △ Less

Submitted 20 January, 2021; v1 submitted 23 February, 2017; originally announced February 2017.

Journal ref: Adv. Appl. Probab. 53 (2021) 981-1022

arXiv:1702.03917 [pdf, ps, other]

MEXIT: Maximal un-coupling times for stochastic processes

Authors: P. A. Ernst, W. S. Kendall, G. O. Roberts, J. S. Rosenthal

Abstract: Classical coupling constructions arrange for copies of the \emph{same} Markov process started at two \emph{different} initial states to become equal as soon as possible. In this paper, we consider an alternative coupling framework in which one seeks to arrange for two \emph{different} Markov (or other stochastic) processes to remain equal for as long as possible, when started in the \emph{same} st… ▽ More Classical coupling constructions arrange for copies of the \emph{same} Markov process started at two \emph{different} initial states to become equal as soon as possible. In this paper, we consider an alternative coupling framework in which one seeks to arrange for two \emph{different} Markov (or other stochastic) processes to remain equal for as long as possible, when started in the \emph{same} state. We refer to this "un-coupling" or "maximal agreement" construction as \emph{MEXIT}, standing for "maximal exit". After highlighting the importance of un-coupling arguments in a few key statistical and probabilistic settings, we develop an explicit \MEXIT construction for stochastic processes in discrete time with countable state-space. This construction is generalized to random processes on general state-space running in continuous time, and then exemplified by discussion of \MEXIT for Brownian motions with two different constant drifts. △ Less

Submitted 30 December, 2018; v1 submitted 13 February, 2017; originally announced February 2017.

Comments: 28 pages

MSC Class: 60J05; 60J25; 60J60

Journal ref: Stochastic Processes and their Applications, 129(2): 355-380 (2019)

arXiv:1611.03141 [pdf, ps, other]

Hitting Time and Convergence Rate Bounds for Symmetric Langevin Diffusions

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We provide quantitative bounds on the convergence to stationarity of real-valued Langevin diffusions with symmetric target densities. We provide quantitative bounds on the convergence to stationarity of real-valued Langevin diffusions with symmetric target densities. △ Less

Submitted 9 November, 2016; originally announced November 2016.

arXiv:1605.02113 [pdf, other]

Likelihood Inflating Sampling Algorithm

Authors: Reihaneh Entezari, Radu V. Craiu, Jeffrey S. Rosenthal

Abstract: Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational cos… ▽ More Markov Chain Monte Carlo (MCMC) sampling from a posterior distribution corresponding to a massive data set can be computationally prohibitive since producing one sample requires a number of operations that is linear in the data size. In this paper, we introduce a new communication-free parallel method, the Likelihood Inflating Sampling Algorithm (LISA), that significantly reduces computational costs by randomly splitting the dataset into smaller subsets and running MCMC methods independently in parallel on each subset using different processors. Each processor will be used to run an MCMC chain that samples sub-posterior distributions which are defined using an "inflated" likelihood function. We develop a strategy for combining the draws from different sub-posteriors to study the full posterior of the Bayesian Additive Regression Trees (BART) model. The performance of the method is tested using both simulated and real data. △ Less

Submitted 30 June, 2017; v1 submitted 6 May, 2016; originally announced May 2016.

Comments: 32 pages, 3 figures, submitted

arXiv:1603.03510 [pdf, other]

Adaptive Component-wise Multiple-Try Metropolis Sampling

Authors: Jinyoung Yang, Evgeny Levi, Radu V. Craiu, Jeffrey S. Rosenthal

Abstract: One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for an… ▽ More One of the most widely used samplers in practice is the component-wise Metropolis-Hastings (CMH) sampler that updates in turn the components of a vector valued Markov chain using accept-reject moves generated from a proposal distribution. When the target distribution of a Markov chain is irregularly shaped, a `good' proposal distribution for one part of the state space might be a `poor' one for another part of the state space. We consider a component-wise multiple-try Metropolis (CMTM) algorithm that can automatically choose from a set of candidate moves sampled from different distributions. The computational efficiency is increased using an adaptation rule for the CMTM algorithm that dynamically builds a better set of proposal distributions as the Markov chain runs. The ergodicity of the adaptive chain is demonstrated theoretically. The performance is studied via simulations and real data examples. △ Less

Submitted 21 March, 2017; v1 submitted 10 March, 2016; originally announced March 2016.

arXiv:1411.0712 [pdf, ps, other]

Complexity Bounds for MCMC via Diffusion Limits

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We connect known results about diffusion limits of Markov chain Monte Carlo (MCMC) algorithms to the Computer Science notion of algorithm complexity. Our main result states that any diffusion limit of a Markov process implies a corresponding complexity bound (in an appropriate metric). We then combine this result with previously-known MCMC diffusion limit results to prove that under appropriate as… ▽ More We connect known results about diffusion limits of Markov chain Monte Carlo (MCMC) algorithms to the Computer Science notion of algorithm complexity. Our main result states that any diffusion limit of a Markov process implies a corresponding complexity bound (in an appropriate metric). We then combine this result with previously-known MCMC diffusion limit results to prove that under appropriate assumptions, the Random-Walk Metropolis (RWM) algorithm in $d$ dimensions takes $O(d)$ iterations to converge to stationarity, while the Metropolis-Adjusted Langevin Algorithm (MALA) takes $O(d^{1/3})$ iterations to converge to stationarity. △ Less

Submitted 3 November, 2014; originally announced November 2014.

arXiv:1403.3950 [pdf, ps, other]

doi 10.1214/14-AAP1083

Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms

Authors: Radu V. Craiu, Lawrence Gray, Krzysztof Łatuszyński, Neal Madras, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov cha… ▽ More We consider whether ergodic Markov chains with bounded step size remain bounded in probability when their transitions are modified by an adversary on a bounded subset. We provide counterexamples to show that the answer is no in general, and prove theorems to show that the answer is yes under various additional assumptions. We then use our results to prove convergence of various adaptive Markov chain Monte Carlo algorithms. △ Less

Submitted 5 November, 2015; v1 submitted 16 March, 2014; originally announced March 2014.

Comments: Published at http://dx.doi.org/10.1214/14-AAP1083 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP1083

Journal ref: Annals of Applied Probability 2015, Vol. 25, No. 6, 3592-3623

arXiv:1401.3559 [pdf, ps, other]

doi 10.1214/12-AAP918

Minimising MCMC variance via diffusion limits, with an application to simulated tempering

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We derive new results comparing the asymptotic variance of diffusions by writing them as appropriate limits of discrete-time birth-death chains which themselves satisfy Peskun orderings. We then apply our results to simulated tempering algorithms to establish which choice of inverse temperatures minimises the asymptotic variance of all functionals and thus leads to the most efficient MCMC algorith… ▽ More We derive new results comparing the asymptotic variance of diffusions by writing them as appropriate limits of discrete-time birth-death chains which themselves satisfy Peskun orderings. We then apply our results to simulated tempering algorithms to establish which choice of inverse temperatures minimises the asymptotic variance of all functionals and thus leads to the most efficient MCMC algorithm. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Comments: Published in at http://dx.doi.org/10.1214/12-AAP918 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP918

Journal ref: Annals of Applied Probability 2014, Vol. 24, No. 1, 131-149

arXiv:1309.7209 [pdf, ps, other]

doi 10.1214/14-AOS1278

On the efficiency of pseudo-marginal random walk Metropolis algorithms

Authors: Chris Sherlock, Alexandre H. Thiery, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approac… ▽ More We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approaches infinity, under the assumption that the noise in the estimate of the log-target is additive and is independent of the position. For targets with independent and identically distributed components, we also obtain a limiting diffusion for the first component. We then consider the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we prove that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. We also find that the optimal scaling is insensitive to the noise and that the optimal variance of the noise is insensitive to the scaling. The theory is illustrated with a simulation study using the particle marginal random walk Metropolis. △ Less

Submitted 30 December, 2014; v1 submitted 27 September, 2013; originally announced September 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1278 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1278

Journal ref: Annals of Statistics 2015, Vol. 43, No. 1, 238-275

arXiv:1307.1799 [pdf, ps, other]

The Containment Condition and AdapFail algorithms

Authors: Krzysztof Latuszynski, Jeffrey S. Rosenthal

Abstract: This short note investigates convergence of adaptive MCMC algorithms, i.e.\ algorithms which modify the Markov chain update probabilities on the fly. We focus on the Containment condition introduced in \cite{roberts2007coupling}. We show that if the Containment condition is \emph{not} satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algo… ▽ More This short note investigates convergence of adaptive MCMC algorithms, i.e.\ algorithms which modify the Markov chain update probabilities on the fly. We focus on the Containment condition introduced in \cite{roberts2007coupling}. We show that if the Containment condition is \emph{not} satisfied, then the algorithm will perform very poorly. Specifically, with positive probability, the adaptive algorithm will be asymptotically less efficient then \emph{any} nonadaptive ergodic MCMC algorithm. We call such algorithms \texttt{AdapFail}, and conclude that they should not be used. △ Less

Submitted 28 December, 2013; v1 submitted 6 July, 2013; originally announced July 2013.

Comments: slight revision and with referees comments incorporated

arXiv:1303.2814 [pdf, ps, other]

doi 10.1214/12-AOS1075

Convergence rate of Markov chain methods for genomic motif discovery

Authors: Dawn B. Woodard, Jeffrey S. Rosenthal

Abstract: We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of… ▽ More We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Specifically, we show that this occurs whenever there is more than one true repeating pattern in the data. In practice there are typically multiple such patterns in biological data, the goal being to detect the most well-conserved and frequently-occurring of these. Our findings match empirical results, in which the motif-discovery Gibbs sampler has exhibited such poor convergence that it is used only for finding modes of the posterior distribution (candidate motifs) rather than for obtaining samples from that distribution. Ours are some of the first meaningful bounds on the convergence rate of a Markov chain method for sampling from a multimodal posterior distribution, as a function of statistical quantities like the number of observations. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1075 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1075

Journal ref: Annals of Statistics 2013, Vol. 41, No. 1, 91-124

arXiv:1104.2974 [pdf, ps, other]

doi 10.1214/10-AOAS378

Detecting multiple authorship of United States Supreme Court legal decisions using function words

Authors: Jeffrey S. Rosenthal, Albert H. Yoon

Abstract: This paper uses statistical analysis of function words used in legal judgments written by United States Supreme Court justices, to determine which justices have the most variable writing style (which may indicated greater reliance on their law clerks when writing opinions), and also the extent to which different justices' writing styles are distinguishable from each other. This paper uses statistical analysis of function words used in legal judgments written by United States Supreme Court justices, to determine which justices have the most variable writing style (which may indicated greater reliance on their law clerks when writing opinions), and also the extent to which different justices' writing styles are distinguishable from each other. △ Less

Submitted 15 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS378 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS378

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 1, 283-308

arXiv:1101.5838 [pdf, ps, other]

doi 10.1214/11-AAP806

Adaptive Gibbs samplers and related MCMC methods

Authors: Krzysztof Łatuszyński, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We consider various versions of adaptive Gibbs and Metropolis-within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run by learning as they go in an attempt to optimize the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positi… ▽ More We consider various versions of adaptive Gibbs and Metropolis-within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run by learning as they go in an attempt to optimize the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions. △ Less

Submitted 27 February, 2013; v1 submitted 30 January, 2011; originally announced January 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-AAP806 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: substantial text overlap with arXiv:1001.2797

Report number: IMS-AAP-AAP806

Journal ref: Annals of Applied Probability 2013, Vol. 23, No. 1, 66-98

arXiv:1001.2797 [pdf, ps, other]

Adaptive Gibbs samplers

Authors: Krzysztof Latuszynski, Jeffrey S. Rosenthal

Abstract: We consider various versions of adaptive Gibbs and Metropolis within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run, by learning as they go in an attempt to optimise the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various pos… ▽ More We consider various versions of adaptive Gibbs and Metropolis within-Gibbs samplers, which update their selection probabilities (and perhaps also their proposal distributions) on the fly during a run, by learning as they go in an attempt to optimise the algorithm. We present a cautionary example of how even a simple-seeming adaptive Gibbs sampler may fail to converge. We then present various positive results guaranteeing convergence of adaptive Gibbs samplers under certain conditions. △ Less

Submitted 15 January, 2010; originally announced January 2010.

arXiv:0806.2747 [pdf, ps, other]

doi 10.1214/07-AAP486

Variance bounding Markov chains

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We introduce a new property of Markov chains, called variance bounding. We prove that, for reversible chains at least, variance bounding is weaker than, but closely related to, geometric ergodicity. Furthermore, variance bounding is equivalent to the existence of usual central limit theorems for all $L^2$ functionals. Also, variance bounding (unlike geometric ergodicity) is preserved under the P… ▽ More We introduce a new property of Markov chains, called variance bounding. We prove that, for reversible chains at least, variance bounding is weaker than, but closely related to, geometric ergodicity. Furthermore, variance bounding is equivalent to the existence of usual central limit theorems for all $L^2$ functionals. Also, variance bounding (unlike geometric ergodicity) is preserved under the Peskun order. We close with some applications to Metropolis--Hastings algorithms. △ Less

Submitted 17 June, 2008; originally announced June 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-AAP486 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP486 MSC Class: 60J10 (Primary) 65C40; 47A10 (Secondary)

Journal ref: Annals of Applied Probability 2008, Vol. 18, No. 3, 1201-1214

arXiv:math/0702412 [pdf, ps, other]

doi 10.1214/105051606000000510

Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov chains

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: A $φ$-irreducible and aperiodic Markov chain with stationary probability distribution will converge to its stationary distribution from almost all starting points. The property of Harris recurrence allows us to replace ``almost all'' by ``all,'' which is potentially important when running Markov chain Monte Carlo algorithms. Full-dimensional Metropolis--Hastings algorithms are known to be Harris… ▽ More A $φ$-irreducible and aperiodic Markov chain with stationary probability distribution will converge to its stationary distribution from almost all starting points. The property of Harris recurrence allows us to replace ``almost all'' by ``all,'' which is potentially important when running Markov chain Monte Carlo algorithms. Full-dimensional Metropolis--Hastings algorithms are known to be Harris recurrent. In this paper, we consider conditions under which Metropolis-within-Gibbs and trans-dimensional Markov chains are or are not Harris recurrent. We present a simple but natural two-dimensional counter-example showing how Harris recurrence can fail, and also a variety of positive results which guarantee Harris recurrence. We also present some open problems. We close with a discussion of the practical implications for MCMC algorithms. △ Less

Submitted 14 February, 2007; originally announced February 2007.

Comments: Published at http://dx.doi.org/10.1214/105051606000000510 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP0201 MSC Class: 60J05 (Primary) 65C05; 60J22; 62F15 (Secondary)

Journal ref: Annals of Applied Probability 2006, Vol. 16, No. 4, 2123-2139

arXiv:math/0503532 [pdf, ps, other]

doi 10.1214/105051604000000620

Quantitative bounds on convergence of time-inhomogeneous Markov chains

Authors: R. Douc, E. Moulines, Jeffrey S. Rosenthal

Abstract: Convergence rates of Markov chains have been widely studied in recent years. In particular, quantitative bounds on convergence rates have been studied in various forms by Meyn and Tweedie [Ann. Appl. Probab. 4 (1994) 981-1101], Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566], Roberts and Tweedie [Stochastic Process. Appl. 80 (1999) 211-229], Jones and Hobert [Statist. Sci. 16 (2001) 312-3… ▽ More Convergence rates of Markov chains have been widely studied in recent years. In particular, quantitative bounds on convergence rates have been studied in various forms by Meyn and Tweedie [Ann. Appl. Probab. 4 (1994) 981-1101], Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566], Roberts and Tweedie [Stochastic Process. Appl. 80 (1999) 211-229], Jones and Hobert [Statist. Sci. 16 (2001) 312-334] and Fort [Ph.D. thesis (2001) Univ. Paris VI]. In this paper, we extend a result of Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558-566] that concerns quantitative convergence rates for time-homogeneous Markov chains. Our extension allows us to consider f-total variation distance (instead of total variation) and time-inhomogeneous Markov chains. We apply our results to simulated annealing. △ Less

Submitted 24 March, 2005; originally announced March 2005.

Comments: Published at http://dx.doi.org/10.1214/105051604000000620 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP018 MSC Class: 60J27; 60J22 (Primary)

Journal ref: Annals of Applied Probability 2004, Vol. 14, No. 4, 1643-1665

arXiv:math/0404093 [pdf, ps, other]

Moment conditions for a sequence with negative drift to be uniformly bounded in L^r

Authors: Robin Pemantle, Jeffrey S. Rosenthal

Abstract: Suppose a sequence of random variables {X_n} has negative drift when above a certain threshold and has increments bounded in L^p. When p>2 this implies that EX_n is bounded above by a constant independent of n and the particular sequence {X_n}. When p=<2 there are counterexamples showing this does not hold. In general, increments bounded in L^p lead to a uniform L^r bound on X_n^+ for any r<p-1,… ▽ More Suppose a sequence of random variables {X_n} has negative drift when above a certain threshold and has increments bounded in L^p. When p>2 this implies that EX_n is bounded above by a constant independent of n and the particular sequence {X_n}. When p=<2 there are counterexamples showing this does not hold. In general, increments bounded in L^p lead to a uniform L^r bound on X_n^+ for any r<p-1, but not for r>=p-1. These results are motivated by questions about stability of queueing networks. △ Less

Submitted 5 April, 2004; originally announced April 2004.

Comments: 18 pages

MSC Class: 60G07 (Primary) 60F25 (Secondary)

Journal ref: Stoch. Proc. Appl., 82, 143 - 155 (1999)

arXiv:math/0404033 [pdf, ps, other]

doi 10.1214/154957804100000024

General state space Markov chains and MCMC algorithms

Authors: Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: This paper surveys various results about Markov chains on general (non-countable) state spaces. It begins with an introduction to Markov chain Monte Carlo (MCMC) algorithms, which provide the motivation and context for the theory which follows. Then, sufficient conditions for geometric and uniform ergodicity are presented, along with quantitative bounds on the rate of convergence to stationarity… ▽ More This paper surveys various results about Markov chains on general (non-countable) state spaces. It begins with an introduction to Markov chain Monte Carlo (MCMC) algorithms, which provide the motivation and context for the theory which follows. Then, sufficient conditions for geometric and uniform ergodicity are presented, along with quantitative bounds on the rate of convergence to stationarity. Many of these results are proved using direct coupling constructions based on minorisation and drift conditions. Necessary and sufficient conditions for Central Limit Theorems (CLTs) are also presented, in some cases proved via the Poisson Equation or direct regeneration constructions. Finally, optimal scaling and weak convergence results for Metropolis-Hastings algorithms are discussed. None of the results presented is new, though many of the proofs are. We also describe some Open Problems. △ Less

Submitted 11 April, 2007; v1 submitted 2 April, 2004; originally announced April 2004.

Comments: Published at http://dx.doi.org/10.1214/154957804100000024 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-PS-PS-2004-15

Journal ref: Probability Surveys 2004, Vol. 1, 20-71

Showing 1–50 of 52 results for author: Rosenthal, J S