-
Stationary distribution approximations of Two-island Wright-Fisher and seed-bank models using Stein's method
Authors:
Han L. Gan,
Maite Wilke-Berenguer
Abstract:
We consider two finite population Markov chain models, the two-island Wright-Fisher model with mutation, and the seed-bank model with mutation. Despite the relatively simple descriptions of the two processes, the the exact form of their stationary distributions is in general intractable. For each of the two models we provide two approximation theorems with explicit upper bounds on the distance bet…
▽ More
We consider two finite population Markov chain models, the two-island Wright-Fisher model with mutation, and the seed-bank model with mutation. Despite the relatively simple descriptions of the two processes, the the exact form of their stationary distributions is in general intractable. For each of the two models we provide two approximation theorems with explicit upper bounds on the distance between the stationary distributions of the finite population Markov chains, and either the stationary distribution of a two-island diffusion model, or the beta distribution. We show that the order of the bounds, and correspondingly the appropriate choice of approximation, depends upon the relative sizes of mutation and migration. In the case where migration and mutation are of the same order, the suitable approximation is the two-island diffusion model, and if migration dominates mutation, then the weighted average of both islands is well approximated by a beta random variable. Our results are derived from a new development of Stein's method for the stationary distribution of the two-island diffusion model for the weak migration results, and utilising the existing framework for Stein's method for the Dirichlet distribution.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Steady-state Dirichlet approximation of the Wright-Fisher model using the prelimit generator comparison approach of Stein's method
Authors:
Anton Braverman,
Han L. Gan
Abstract:
The Wright-Fisher model, originating in Wright (1931) is one of the canonical probabilistic models used in mathematical population genetics to study how genetic type frequencies evolve in time. In this paper we bound the rate of convergence of the stationary distribution for a finite population Wright-Fisher Markov chain with parent independent mutation to the Dirichlet distribution. Our result im…
▽ More
The Wright-Fisher model, originating in Wright (1931) is one of the canonical probabilistic models used in mathematical population genetics to study how genetic type frequencies evolve in time. In this paper we bound the rate of convergence of the stationary distribution for a finite population Wright-Fisher Markov chain with parent independent mutation to the Dirichlet distribution. Our result improves the rate of convergence established in Gan et al. (2017) from $O(1/\sqrt{N})$ to $O(1/N)$. The results are derived using Stein's method, in particular, the prelimit generator comparison method.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Arcsine laws for random walks generated from random permutations with applications to genomics
Authors:
Xiao Fang,
Han Liang Gan,
Susan Holmes,
Haiyan Huang,
Erol Peköz,
Adrian Röllin,
Wenpin Tang
Abstract:
A classical result for the simple symmetric random walk with $2n$ steps is that the number of steps above the origin, the time of the last visit to the origin, and the time of the maximum height all have exactly the same distribution and converge when scaled to the arcsine law. Motivated by applications in genomics, we study the distributions of these statistics for the non-Markovian random walk g…
▽ More
A classical result for the simple symmetric random walk with $2n$ steps is that the number of steps above the origin, the time of the last visit to the origin, and the time of the maximum height all have exactly the same distribution and converge when scaled to the arcsine law. Motivated by applications in genomics, we study the distributions of these statistics for the non-Markovian random walk generated from the ascents and descents of a uniform random permutation and a Mallows($q$) permutation and show that they have the same asymptotic distributions as for the simple random walk. We also give an unexpected conjecture, along with numerical evidence and a partial proof in special cases, for the result that the number of steps above the origin by step $2n$ for the uniform permutation generated walk has exactly the same discrete arcsine distribution as for the simple random walk, even though the other statistics for these walks have very different laws. We also give explicit error bounds to the limit theorems using Stein's method for the arcsine distribution, as well as functional central limit theorems and a strong embedding of the Mallows$(q)$ permutation which is of independent interest.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
Stein's method for the Poisson-Dirichlet distribution and the Ewens Sampling Formula, with applications to Wright-Fisher models
Authors:
Han L. Gan,
Nathan Ross
Abstract:
We provide a general theorem bounding the error in the approximation of a random measure of interest--for example, the empirical population measure of types in a Wright-Fisher model--and a Dirichlet process, which is a measure having Poisson-Dirichlet distributed atoms with i.i.d. labels from a diffuse distribution. The implicit metric of the approximation theorem captures the sizes and locations…
▽ More
We provide a general theorem bounding the error in the approximation of a random measure of interest--for example, the empirical population measure of types in a Wright-Fisher model--and a Dirichlet process, which is a measure having Poisson-Dirichlet distributed atoms with i.i.d. labels from a diffuse distribution. The implicit metric of the approximation theorem captures the sizes and locations of the masses, and so also yields bounds on the approximation between the masses of the measure of interest and the Poisson-Dirichlet distribution. We apply the result to bound the error in the approximation of the stationary distribution of types in the finite Wright-Fisher model with infinite-alleles mutation structure (not necessarily parent independent) by the Poisson-Dirichlet distribution. An important consequence of our result is an explicit upper bound on the total variation distance between the random partition generated by sampling from a finite Wright-Fisher stationary distribution, and the Ewens Sampling Formula. The bound is small if the sample size $n$ is much smaller than $N^{1/6}\log(N)^{-1/2}$, where $N$ is the total population size. Our analysis requires a result of separate interest, giving an explicit bound on the second moment of the number of types of a finite Wright-Fisher stationary distribution. The general approximation result follows from a new development of Stein's method for the Dirichlet process, which follows by viewing the Dirichlet process as the stationary distribution of a Fleming-Viot process, and then applying Barbour's generator approach.
△ Less
Submitted 5 July, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Stein's method and duality of Markov processes
Authors:
Han L. Gan
Abstract:
One of the key ingredients to successfully apply Stein's method for distributional approximation are solutions to the Stein equations and their derivatives. Using Barbour's generator approach, one can solve for the solutions to the Stein equation in terms of the semi-group of a Markov process, which is typically a diffusion process if it is a continuous distribution. For an arbitrary diffusion it…
▽ More
One of the key ingredients to successfully apply Stein's method for distributional approximation are solutions to the Stein equations and their derivatives. Using Barbour's generator approach, one can solve for the solutions to the Stein equation in terms of the semi-group of a Markov process, which is typically a diffusion process if it is a continuous distribution. For an arbitrary diffusion it can a difficult task to evaluate the semi-group and its derivatives. In this paper, for polynomial test functions, instead of calculating the semi-group of a diffusion, via a duality argument, we instead utilise the semi-group of a much simpler Markov jump process. This approach yields a new method for explicitly solving for the solutions of Stein equations for diffusion processes. We present both the general idea of the approach and examples for both univariate and multivariate distributions.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Approximation of the difference of two Poisson-like counts with Skellam
Authors:
H. L. Gan,
Eric D. Kolaczyk
Abstract:
Poisson-like behavior for event count data is ubiquitous in nature. At the same time, differencing of such counts arises in the course of data processing in a variety of areas of application. As a result, the Skellam distribution -- defined as the distribution of the difference of two independent Poisson random variables -- is a natural candidate for approximating the difference of Poisson-like ev…
▽ More
Poisson-like behavior for event count data is ubiquitous in nature. At the same time, differencing of such counts arises in the course of data processing in a variety of areas of application. As a result, the Skellam distribution -- defined as the distribution of the difference of two independent Poisson random variables -- is a natural candidate for approximating the difference of Poisson-like event counts. However, in many contexts strict independence, whether between counts or among events within counts, is not a tenable assumption. Here we characterize the accuracy in approximating the difference of Poisson-like counts by a Skellam random variable. Our results fully generalize existing, more limited results in this direction and, at the same time, our derivations are significantly more concise and elegant. We illustrate the potential impact of these results in the context of problems from network analysis and image processing, where various forms of weak dependence can be expected.
△ Less
Submitted 2 April, 2018; v1 submitted 14 August, 2017;
originally announced August 2017.
-
Improving approximation error bounds via truncation
Authors:
H. L. Gan
Abstract:
One aspect of Poisson approximation is that the support of the random variable of interest is often finite while the support of the Poisson distribution is not. In this paper we will remedy this by examining truncated negative binomial (of which Poisson is a special limiting case) approximation, so as to match the two supports of both distributions, and show that this will lead to improvements in…
▽ More
One aspect of Poisson approximation is that the support of the random variable of interest is often finite while the support of the Poisson distribution is not. In this paper we will remedy this by examining truncated negative binomial (of which Poisson is a special limiting case) approximation, so as to match the two supports of both distributions, and show that this will lead to improvements in the error bounds of the approximation.
△ Less
Submitted 28 April, 2017;
originally announced May 2017.
-
Dirichlet approximation of equilibrium distributions in Cannings models with mutation
Authors:
H. L. Gan,
Adrian Röllin,
Nathan Ross
Abstract:
Consider a haploid population of fixed finite size with a finite number of allele types and having Cannings exchangeable genealogy with neutral mutation. The stationary distribution of the Markov chain of allele counts in each generation is an important quantity in population genetics but has no tractable description in general. We provide upper bounds on the distributional distance between the Di…
▽ More
Consider a haploid population of fixed finite size with a finite number of allele types and having Cannings exchangeable genealogy with neutral mutation. The stationary distribution of the Markov chain of allele counts in each generation is an important quantity in population genetics but has no tractable description in general. We provide upper bounds on the distributional distance between the Dirichlet distribution and this finite population stationary distribution for the Wright-Fisher genealogy with general mutation structure and the Cannings exchangeable genealogy with parent independent mutation structure. In the first case, the bound is small if the population is large and the mutations do not depend too much on parent type; "too much" is naturally quantified by our bound. In the second case, the bound is small if the population is large and the chance of three-mergers in the Cannings genealogy is small relative to the chance of two-mergers; this is the same condition to ensure convergence of the genealogy to Kingman's coalescent. These results follow from a new development of Stein's method for the Dirichlet distribution based on Barbour's generator approach and a probabilistic description of the semigroup of the Wright-Fisher diffusion due to Griffiths, and Li and Tavaré.
△ Less
Submitted 8 December, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Conditional Poisson process approximation
Authors:
H. L. Gan
Abstract:
Point processes are an essential tool when we are interested in where in time or space events occur. The basic starting point for point processes is usually the Poisson process. Over the years, Stein's method has been developed with a great deal of success for Poisson point process approximation. When studying rare events though, typically one only begins modelling after the occurrence of such an…
▽ More
Point processes are an essential tool when we are interested in where in time or space events occur. The basic starting point for point processes is usually the Poisson process. Over the years, Stein's method has been developed with a great deal of success for Poisson point process approximation. When studying rare events though, typically one only begins modelling after the occurrence of such an event. As a result, a point process that is conditional upon at least one atom, is arguably more appropriate in certain applications. In this paper, we develop Stein's method for conditional Poisson point process approximation, and closely examine what sort of difficulties that this conditioning entails. By utilising a characterising immigration-death process, we calculate bounds for the Stein factors.
△ Less
Submitted 10 November, 2015;
originally announced November 2015.
-
Fragility Distributions and their approximations
Authors:
H. L. Gan,
A. Xia
Abstract:
Given a sequence of $n$ identically distributed random variables with common distribution $F$, the \emph{fragility distribution of order $m$}, represented by $\FD$, is the limit conditional distribution of the number of exceedances given there are at least $m$ exceedances, as the threshold tends to the right end point of $F$. In this paper we are concerned with the existence of $\FD$ and its asymp…
▽ More
Given a sequence of $n$ identically distributed random variables with common distribution $F$, the \emph{fragility distribution of order $m$}, represented by $\FD$, is the limit conditional distribution of the number of exceedances given there are at least $m$ exceedances, as the threshold tends to the right end point of $F$. In this paper we are concerned with the existence of $\FD$ and its asymptotic behaviour when $n$ becomes large. For a stationary sequence with its exceedance process converging to a compound Poisson process, we derive an explicit formula for calculating $\lim_{n \to \infty} \FD$. We also establish Stein's method for estimating the errors involved in fragility distribution approximations.
△ Less
Submitted 5 February, 2014;
originally announced February 2014.
-
Stein factors for negative binomial approximation in Wasserstein distance
Authors:
A. D. Barbour,
H. L. Gan,
A. Xia
Abstract:
The paper gives the bounds on the solutions to a Stein equation for the negative binomial distribution that are needed for approximation in terms of the Wasserstein metric. The proofs are probabilistic, and follow the approach introduced in Barbour and Xia (Bernoulli 12 (2006) 943-954). The bounds are used to quantify the accuracy of negative binomial approximation to parasite counts in hosts. Sin…
▽ More
The paper gives the bounds on the solutions to a Stein equation for the negative binomial distribution that are needed for approximation in terms of the Wasserstein metric. The proofs are probabilistic, and follow the approach introduced in Barbour and Xia (Bernoulli 12 (2006) 943-954). The bounds are used to quantify the accuracy of negative binomial approximation to parasite counts in hosts. Since the infectivity of a population can be expected to be proportional to its total parasite burden, the Wasserstein metric is the appropriate choice.
△ Less
Submitted 1 June, 2015; v1 submitted 22 October, 2013;
originally announced October 2013.