-
Almost sure one-endedness of a random graph model of distributed ledgers
Authors:
Jiewei Feng,
Christopher King,
Ken R. Duffy
Abstract:
Blockchain and other decentralized databases, known as distributed ledgers, are designed to store information online where all trusted network members can update the data with transparency. The dynamics of ledger's development can be mathematically represented by a directed acyclic graph (DAG). One essential property of a properly functioning shared ledger is that all network members holding a cop…
▽ More
Blockchain and other decentralized databases, known as distributed ledgers, are designed to store information online where all trusted network members can update the data with transparency. The dynamics of ledger's development can be mathematically represented by a directed acyclic graph (DAG). One essential property of a properly functioning shared ledger is that all network members holding a copy of the ledger agree on a sequence of information added to the ledger, which is referred to as consensus and is known to be related to a structural property of DAG called one-endedness. In this paper, we consider a model of distributed ledger with sequential stochastic arrivals that mimic attachment rules from the IOTA cryptocurrency. We first prove that the number of leaves in the random DAG is bounded by a constant infinitely often through the identification of a suitable martingale, and then prove that a sequence of specific events happens infinitely often. Combining those results we establish that, as time goes to infinity, the IOTA DAG is almost surely one-ended.
△ Less
Submitted 18 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Discrete convolution statistic for hypothesis testing
Authors:
Giulio Prevedello,
Ken R. Duffy
Abstract:
The question of testing for equality in distribution between two linear models, each consisting of sums of distinct discrete independent random variables with unequal numbers of observations, has emerged from the biological research. In this case, the computation of classical $χ^2$ statistics, which would not include all observations, results in loss of power, especially when sample sizes are smal…
▽ More
The question of testing for equality in distribution between two linear models, each consisting of sums of distinct discrete independent random variables with unequal numbers of observations, has emerged from the biological research. In this case, the computation of classical $χ^2$ statistics, which would not include all observations, results in loss of power, especially when sample sizes are small. Here, as an alternative that uses all data, the nonparametric maximum likelihood estimator for the distribution of sum of discrete and independent random variables, which we call the convolution statistic, is proposed and its limiting normal covariance matrix determined. To challenge null hypotheses about the distribution of this sum, the generalized Wald's method is applied to define a testing statistic whose distribution is asymptotic to a $χ^2$ with as many degrees of freedom as the rank of such covariance matrix. Rank analysis also reveals a connection with the roots of the probability generating functions associated to the addend variables of the linear models. A simulation study is performed to compare the convolution test with Pearson's $χ^2$, and to provide usage guidelines.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
The variance of the average depth of a pure birth process converges to 7
Authors:
Ken R. Duffy,
Gianfelice Meli,
Seva Shneer
Abstract:
If trees are constructed from a pure birth process and one defines the depth of a leaf to be the number of edges to its root, it is known that the variance in the depth of a randomly selected leaf of a randomly selected tree grows linearly in time. In this letter, we instead consider the variance of the average depth of leaves within each individual tree, establishing that, in contrast, it converg…
▽ More
If trees are constructed from a pure birth process and one defines the depth of a leaf to be the number of edges to its root, it is known that the variance in the depth of a randomly selected leaf of a randomly selected tree grows linearly in time. In this letter, we instead consider the variance of the average depth of leaves within each individual tree, establishing that, in contrast, it converges to a constant, $7$. This result indicates that while the variance in leaf depths amongst the ensemble of pure birth processes undergoes large fluctuations, the average depth across individual trees is much more consistent.
△ Less
Submitted 1 March, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Sample Path Properties of the Average Generation of a Bellman-Harris Process
Authors:
Gianfelice Meli,
Tom S. Weber,
Ken R. Duffy
Abstract:
Motivated by a recently proposed design for a DNA coded randomised algorithm that enables inference of the average generation of a collection of cells descendent from a common progenitor, here we establish strong convergence properties for the average generation of a super-critical Bellman-Harris process. We further extend those results to a two-type Bellman-Harris process where one type can give…
▽ More
Motivated by a recently proposed design for a DNA coded randomised algorithm that enables inference of the average generation of a collection of cells descendent from a common progenitor, here we establish strong convergence properties for the average generation of a super-critical Bellman-Harris process. We further extend those results to a two-type Bellman-Harris process where one type can give rise to the other, but not vice versa. These results further affirm the estimation method's potential utility by establishing its long run accuracy on individual sample-paths, and significantly expanding its remit to encompass cellular development that gives rise to differentiated offspring with distinct population dynamics.
△ Less
Submitted 9 August, 2019; v1 submitted 18 July, 2018;
originally announced July 2018.
-
Estimating large deviation rate functions
Authors:
Ken R. Duffy,
Brendan D. Williamson
Abstract:
Establishing a Large Deviation Principle (LDP) proves to be a powerful result for a vast number of stochastic models in many application areas of probability theory. The key object of an LDP is the large deviations rate function, from which probabilistic estimates of rare events can be determined. In order make these results empirically applicable, it would be necessary to estimate the rate functi…
▽ More
Establishing a Large Deviation Principle (LDP) proves to be a powerful result for a vast number of stochastic models in many application areas of probability theory. The key object of an LDP is the large deviations rate function, from which probabilistic estimates of rare events can be determined. In order make these results empirically applicable, it would be necessary to estimate the rate function from observations. This is the question we address in this article for the best known and most widely used LDP: Cramér's theorem for random walks.
We establish that even when only a narrow LDP holds for Cramér's Theorem, as occurs for heavy-tailed increments, one gets a LDP for estimating the random walk's rate function in the space of convex lower-semicontinuous functions equipped with the Attouch-Wets topology via empirical estimates of the moment generating function. This result may seem surprising as it is saying that for Cramér's theorem, one can quickly form non-parametric estimates of the function that governs the likelihood of rare events.
△ Less
Submitted 22 June, 2017; v1 submitted 6 November, 2015;
originally announced November 2015.
-
Inferring average generation via division-linked labeling
Authors:
Tom S. Weber,
Leila Perie,
Ken R. Duffy
Abstract:
For proliferating cells subject to both division and death, how can one estimate the average generation number of the living population without continuous observation or a division-diluting dye? In this paper we provide a method for cell systems such that at each division there is an unlikely, heritable one-way label change that has no impact other than to serve as a distinguishing marker. If the…
▽ More
For proliferating cells subject to both division and death, how can one estimate the average generation number of the living population without continuous observation or a division-diluting dye? In this paper we provide a method for cell systems such that at each division there is an unlikely, heritable one-way label change that has no impact other than to serve as a distinguishing marker. If the probability of label change per cell generation can be determined and the proportion of labeled cells at a given time point can be measured, we establish that the average generation number of living cells can be estimated. Crucially, the estimator does not depend on knowledge of the statistics of cell cycle, death rates or total cell numbers. We validate the estimator and illustrate its features through comparison with published data and physiologically parameterized stochastic simulations, using it to suggest new experimental designs.
△ Less
Submitted 27 April, 2017; v1 submitted 7 June, 2015;
originally announced June 2015.
-
Tail asymptotics for busy periods
Authors:
Ken R. Duffy,
Sean P. Meyn
Abstract:
The busy period for a queue is cast as the area swept under the random walk until it first returns to zero, $B$. Encompassing non-i.i.d. increments, the large-deviations asymptotics of $B$ is addressed, under the assumption that the increments satisfy standard conditions, including a negative drift. The main conclusions provide insight on the probability of a large busy period, and the manner in w…
▽ More
The busy period for a queue is cast as the area swept under the random walk until it first returns to zero, $B$. Encompassing non-i.i.d. increments, the large-deviations asymptotics of $B$ is addressed, under the assumption that the increments satisfy standard conditions, including a negative drift. The main conclusions provide insight on the probability of a large busy period, and the manner in which this occurs:
I) The scaled probability of a large busy period has the asymptote, for any $b>0$,
\lim_{n\to\infty} \frac{1}{\sqrt{n}} \log P(B\geq bn) = -K\sqrt{b},
\hbox{where} \quad K = 2 \sqrt{-\int_0^{λ^*} Λ(θ) dθ}, \quad \hbox{with $λ^*=\sup\{θ:Λ(θ)\leq0\}$,}
and with $Λ$ denoting the scaled cumulant generating function of the increments process.
II) The most likely path to a large swept area is found to be a simple rescaling of the path on $[0,1]$ given by, [ψ^*(t) = -Λ(λ^*(1-t))/λ^*.] In contrast to the piecewise linear most likely path leading the random walk to hit a high level, this is strictly concave in general. While these two most likely paths have very different forms, their derivatives coincide at the start of their trajectories, and at their first return to zero.
These results partially answer an open problem of Kulick and Palmowski regarding the tail of the work done during a busy period at a single server queue. The paper concludes with applications of these results to the estimation of the busy period statistics $(λ^*, K)$ based on observations of the increments, offering the possibility of estimating the likelihood of a large busy period in advance of observing one.
△ Less
Submitted 10 March, 2012; v1 submitted 8 December, 2011;
originally announced December 2011.
-
Estimating Loynes' exponent
Authors:
Ken R. Duffy,
Sean P. Meyn
Abstract:
Loynes' distribution, which characterizes the one dimensional marginal of the stationary solution to Lindley's recursion, possesses an ultimately exponential tail for a large class of increment processes. If one can observe increments but does not know their probabilistic properties, what are the statistical limits of estimating the tail exponent of Loynes' distribution? We conjecture that in br…
▽ More
Loynes' distribution, which characterizes the one dimensional marginal of the stationary solution to Lindley's recursion, possesses an ultimately exponential tail for a large class of increment processes. If one can observe increments but does not know their probabilistic properties, what are the statistical limits of estimating the tail exponent of Loynes' distribution? We conjecture that in broad generality a consistent sequence of non-parametric estimators can be constructed that satisfies a large deviation principle. We present rigorous support for this conjecture under restrictive assumptions and simulation evidence indicating why we believe it to be true in greater generality.
△ Less
Submitted 28 October, 2009;
originally announced October 2009.
-
Most likely paths to error when estimating the mean of a reflected random walk
Authors:
Ken R. Duffy,
Sean P. Meyn
Abstract:
It is known that simulation of the mean position of a Reflected Random Walk (RRW) $\{W_n\}$ exhibits non-standard behavior, even for light-tailed increment distributions with negative drift. The Large Deviation Principle (LDP) holds for deviations below the mean, but for deviations at the usual speed above the mean the rate function is null. This paper takes a deeper look at this phenomenon. Condi…
▽ More
It is known that simulation of the mean position of a Reflected Random Walk (RRW) $\{W_n\}$ exhibits non-standard behavior, even for light-tailed increment distributions with negative drift. The Large Deviation Principle (LDP) holds for deviations below the mean, but for deviations at the usual speed above the mean the rate function is null. This paper takes a deeper look at this phenomenon. Conditional on a large sample mean, a complete sample path LDP analysis is obtained. Let $I$ denote the rate function for the one dimensional increment process. If $I$ is coercive, then given a large simulated mean position, under general conditions our results imply that the most likely asymptotic behavior, $ψ^*$, of the paths $n^{-1} W_{\lfloor tn\rfloor}$ is to be zero apart from on an interval $[T_0,T_1]\subset[0,1]$ and to satisfy the functional equation \begin{align*} \nabla I\left(\ddtψ^*(t)\right)=λ^*(T_1-t) \quad \text{whenever } ψ(t)\neq 0. \end{align*} If $I$ is non-coercive, a similar, but slightly more involved, result holds.
These results prove, in broad generality, that Monte Carlo estimates of the steady-state mean position of a RRW have a high likelihood of over-estimation. This has serious implications for the performance evaluation of queueing systems by simulation techniques where steady state expected queue-length and waiting time are key performance metrics. The results show that naïve estimates of these quantities from simulation are highly likely to be conservative.
△ Less
Submitted 25 June, 2010; v1 submitted 24 June, 2009;
originally announced June 2009.