-
Stochastic ordering, attractiveness and couplings in non-conservative particle systems
Authors:
Raúl Gouet,
F. Javier López,
Gerardo Sanz
Abstract:
We analyse the stochastic comparison of interacting particle systems allowing for multiple arrivals, departures and non-conservative jumps of individuals between sites. That is, if $k$ individuals leave site $x$ for site $y$, a possibly different number $l$ arrive at destination. This setting includes new models, when compared to the conservative case, such as metapopulation models with deaths dur…
▽ More
We analyse the stochastic comparison of interacting particle systems allowing for multiple arrivals, departures and non-conservative jumps of individuals between sites. That is, if $k$ individuals leave site $x$ for site $y$, a possibly different number $l$ arrive at destination. This setting includes new models, when compared to the conservative case, such as metapopulation models with deaths during migrations. It implies a sharp increase of technical complexity, given the numerous changes to consider. Known results are significantly generalised, even in the conservative case, as no particular form of the transition rates is assumed.
We obtain necessary and sufficient conditions on the rates for the stochastic comparison of the processes and prove their equivalence with the existence of an order-preserving Markovian coupling. As a corollary, we get necessary and sufficient conditions for the attractiveness of the processes. A salient feature of our approach lies in the presentation of the coupling in terms of solutions to network flow problems.
We illustrate the applicability of our results to a flexible family of population models described as interacting particle systems, with a range of parameters controlling births, deaths, catastrophes or migrations. We provide explicit conditions on the parameters for the stochastic comparison and attractiveness of the models, showing their usefulness in studying their limit behaviour. Additionally, we give three examples of constructing the coupling.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Characterisation of distributions through $δ$-records and martingales
Authors:
Raúl Gouet,
Miguel Lafuente,
F. Javier López,
Gerardo Sanz
Abstract:
Given parameters $c>0, δ\ne0$ and a sequence $(X_n)$ of real-valued, integrable, independent and identically $F$-distributed random variables, we characterise distributions $F$ such that $(N_n-cM_n)$ is a martingale, where $N_n$ denotes the number of observations $X_k$ among $X_1,\ldots,X_n$ such that $X_k>M_{k-1}+δ$, called $δ$-records, and $M_k=\max\{X_1,\ldots, X_k\}$.
The problem is recast a…
▽ More
Given parameters $c>0, δ\ne0$ and a sequence $(X_n)$ of real-valued, integrable, independent and identically $F$-distributed random variables, we characterise distributions $F$ such that $(N_n-cM_n)$ is a martingale, where $N_n$ denotes the number of observations $X_k$ among $X_1,\ldots,X_n$ such that $X_k>M_{k-1}+δ$, called $δ$-records, and $M_k=\max\{X_1,\ldots, X_k\}$.
The problem is recast as $1-F(x+δ)=c\int_{x}^{\infty}(1-F)(t)dt$, for $x\in T$, with $F(T)=1$. Unlike standard functional equations, where the equality must hold for all $x$ in a fixed set, our problem involves a domain that depends on $F$ itself, introducing complexity but allowing for more possibilities of solutions.
We find the explicit expressions of all solutions when $δ< 0$ and, when $δ> 0$, for distributions with bounded support. In the unbounded support case, we focus attention on continuous and lattice distributions. In the continuous setting, with support $\mathbb{R}_+$, we reduce the problem to a delay differential equation, showing that, besides particular cases of the exponential distribution, mixtures of exponential and gamma distributions and many others are solutions as well. The lattice case, with support $\mathbb{Z}_+$ is treated analogously and reduced to the study of a difference equation. Analogous results are obtained; in particular, mixtures of geometric and negative binomial distributions are found to solve the problem.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Exact and asymptotic properties of $δ$-records in the linear drift model
Authors:
Raúl Gouet,
Miguel Lafuente,
F. Javier López,
Gerardo Sanz
Abstract:
The study of records in the Linear Drift Model (LDM) has attracted much attention recently due to applications in several fields. In the present paper we study $δ$-records in the LDM, defined as observations which are greater than all previous observations, plus a fixed real quantity $δ$. We give analytical properties of the probability of $δ$-records and study the correlation between $δ$-record e…
▽ More
The study of records in the Linear Drift Model (LDM) has attracted much attention recently due to applications in several fields. In the present paper we study $δ$-records in the LDM, defined as observations which are greater than all previous observations, plus a fixed real quantity $δ$. We give analytical properties of the probability of $δ$-records and study the correlation between $δ$-record events. We also analyse the asymptotic behaviour of the number of $δ$-records among the first $n$ observations and give conditions for convergence to the Gaussian distribution. As a consequence of our results, we solve a conjecture posed in J. Stat. Mech. 2010, P10013, regarding the total number of records in a LDM with negative drift. Examples of application to particular distributions, such as Gumbel or Pareto are also provided. We illustrate our results with a real data set of summer temperatures in Spain, where the LDM is consistent with the global-warming phenomenon.
△ Less
Submitted 11 June, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Asymptotics of the overflow in urn models
Authors:
Raul Gouet,
Paweł Hitczenko,
Jacek Wesołowski
Abstract:
Consider a number, finite or not, of urns each with fixed capacity $r$ and balls randomly distributed among them. An overflow is the number of balls that are assigned to urns that already contain $r$ balls. When $r=1$, using analytic methods, Hwang and Janson gave conditions under which the overflow (which in this case is just the number of balls landing in non--empty urns) has an asymptotically P…
▽ More
Consider a number, finite or not, of urns each with fixed capacity $r$ and balls randomly distributed among them. An overflow is the number of balls that are assigned to urns that already contain $r$ balls. When $r=1$, using analytic methods, Hwang and Janson gave conditions under which the overflow (which in this case is just the number of balls landing in non--empty urns) has an asymptotically Poisson distribution as the number of balls grows to infinity. Our aim here is to systematically study the asymptotics of the overflow in general situation, i.~e. for arbitrary $r$. In particular, we provide sufficient conditions for both Poissonian and normal asymptotics for general $r$, thus extending Hwang--Janson's work. Our approach relies on purely probabilistic methods.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Minimax convergence rate for estimating the Wasserstein barycenter of random measures on the real line
Authors:
Jérémie Bigot,
Raúl Gouet,
Thierry Klein,
Alfredo López
Abstract:
This paper is focused on the statistical analysis of probability measures $ν_{1},\ldots,ν_{n}$ on $\mathbb{R}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $ν_{i}$ are absolutely continuous with densities $f_{i}$ that are not directly observable. In this case, instead of the densities…
▽ More
This paper is focused on the statistical analysis of probability measures $ν_{1},\ldots,ν_{n}$ on $\mathbb{R}$ that can be viewed as independent realizations of an underlying stochastic process. We consider the situation of practical importance where the random measures $ν_{i}$ are absolutely continuous with densities $f_{i}$ that are not directly observable. In this case, instead of the densities, we have access to datasets of real random variables $(X_{i,j})_{1 \leq i \leq n; \; 1 \leq j \leq p_{i} }$ organized in the form of $n$ experimental units, such that $X_{i,1},\ldots,X_{i,p_{i}}$ are iid observations sampled from a random measure $ν_{i}$ for each $1 \leq i \leq n$. In this setting, we focus on first-order statistics methods for estimating, from such data, a meaningful structural mean measure. For the purpose of taking into account phase and amplitude variations in the observations, we argue that the notion of Wasserstein barycenter is a relevant tool. The main contribution of this paper is to characterize the rate of convergence of a (possibly smoothed) empirical Wasserstein barycenter towards its population counterpart in the asymptotic setting where both $n$ and $\min_{1 \leq i \leq n} p_{i}$ may go to infinity. The optimality of this procedure is discussed from the minimax point of view with respect to the Wasserstein metric. We also highlight the connection between our approach and the curve registration problem in statistics. Some numerical experiments are used to illustrate the results of the paper on the convergence rate of empirical Wasserstein barycenters.
△ Less
Submitted 28 March, 2017; v1 submitted 13 June, 2016;
originally announced June 2016.
-
Geodesic PCA in the Wasserstein space
Authors:
Jérémie Bigot,
Raúl Gouet,
Thierry Klein,
Alfredo López
Abstract:
We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that th…
▽ More
We introduce the method of Geodesic Principal Component Analysis (GPCA) on the space of probability measures on the line, with finite second moment, endowed with the Wasserstein metric. We discuss the advantages of this approach, over a standard functional PCA of probability densities in the Hilbert space of square-integrable functions. We establish the consistency of the method by showing that the empirical GPCA converges to its population counterpart, as the sample size tends to infinity. A key property in the study of GPCA is the isometry between the Wasserstein space and a closed convex subset of the space of square-integrable functions, with respect to an appropriate measure. Therefore, we consider the general problem of PCA in a closed convex subset of a separable Hilbert space, which serves as basis for the analysis of GPCA and also has interest in its own right. We provide illustrative examples on simple statistical models, to show the benefits of this approach for data analysis. The method is also applied to a real dataset of population pyramids.
△ Less
Submitted 3 October, 2014; v1 submitted 29 July, 2013;
originally announced July 2013.
-
Extrapolation of Urn Models via Poissonization: Accurate Measurements of the Microbial Unknown
Authors:
Manuel Lladser,
Raúl Gouet,
Jens Reeder
Abstract:
The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the alpha-diversity i.e. the total number of different species present in the environment, tight bounds on this quantity may be highly uncertain because a small fractio…
▽ More
The availability of high-throughput parallel methods for sequencing microbial communities is increasing our knowledge of the microbial world at an unprecedented rate. Though most attention has focused on determining lower-bounds on the alpha-diversity i.e. the total number of different species present in the environment, tight bounds on this quantity may be highly uncertain because a small fraction of the environment could be composed of a vast number of different species. To better assess what remains unknown, we propose instead to predict the fraction of the environment that belongs to unsampled classes. Modeling samples as draws with replacement of colored balls from an urn with an unknown composition, and under the sole assumption that there are still undiscovered species, we show that conditionally unbiased predictors and exact prediction intervals (of constant length in logarithmic scale) are possible for the fraction of the environment that belongs to unsampled classes. Our predictions are based on a Poissonization argument, which we have implemented in what we call the Embedding algorithm. In fixed i.e. non-randomized sample sizes, the algorithm leads to very accurate predictions on a sub-sample of the original sample. We quantify the effect of fixed sample sizes on our prediction intervals and test our methods and others found in the literature against simulated environments, which we devise taking into account datasets from a human-gut and -hand microbiota. Our methodology applies to any dataset that can be conceptualized as a sample with replacement from an urn. In particular, it could be applied, for example, to quantify the proportion of all the unseen solutions to a binding site problem in a random RNA pool, or to reassess the surveillance of a certain terrorist group, predicting the conditional probability that it deploys a new tactic in a next attack.
△ Less
Submitted 14 September, 2011;
originally announced September 2011.
-
Asymptotic normality for the counting process of weak records and δ-records in discrete models
Authors:
Raúl Gouet,
F. Javier López,
Gerardo Sanz
Abstract:
Let $\{X_n,n\ge1\}$ be a sequence of independent and identically distributed random variables, taking non-negative integer values, and call $X_n$ a $δ$-record if $X_n>\max\{X_1,...,X_{n-1}\}+δ$, where $δ$ is an integer constant. We use martingale arguments to show that the counting process of $δ$-records among the first $n$ observations, suitably centered and scaled, is asymptotically normally d…
▽ More
Let $\{X_n,n\ge1\}$ be a sequence of independent and identically distributed random variables, taking non-negative integer values, and call $X_n$ a $δ$-record if $X_n>\max\{X_1,...,X_{n-1}\}+δ$, where $δ$ is an integer constant. We use martingale arguments to show that the counting process of $δ$-records among the first $n$ observations, suitably centered and scaled, is asymptotically normally distributed for $δ\ne0$. In particular, taking $δ=-1$ we obtain a central limit theorem for the number of weak records.
△ Less
Submitted 5 September, 2007;
originally announced September 2007.