-
Bayesian nonparametric inference on a Fréchet class
Authors:
Emanuela Dreassi,
Luca Pratelli,
Pietro Rigo
Abstract:
Let $(\mathcal{X},\mathcal{F},μ)$ and $(\mathcal{Y},\mathcal{G},ν)$ be probability spaces and $(Z_n)$ a sequence of random variables with values in $(\mathcal{X}\times\mathcal{Y},\,\mathcal{F}\otimes\mathcal{G})$. Let $Γ(μ,ν)$ be the collection of all probability measures $p$ on $\mathcal{F}\otimes\mathcal{G}$ such that…
▽ More
Let $(\mathcal{X},\mathcal{F},μ)$ and $(\mathcal{Y},\mathcal{G},ν)$ be probability spaces and $(Z_n)$ a sequence of random variables with values in $(\mathcal{X}\times\mathcal{Y},\,\mathcal{F}\otimes\mathcal{G})$. Let $Γ(μ,ν)$ be the collection of all probability measures $p$ on $\mathcal{F}\otimes\mathcal{G}$ such that $$p\bigl(A\times\mathcal{Y}\bigr)=μ(A)\quad\text{and}\quad p\bigl(\mathcal{X}\times B\bigr)=ν(B)\quad\text{for all }A\in\mathcal{F}\text{ and }B\in\mathcal{G}.$$ In this paper, we build some probability measures $Π$ on $Γ(μ,ν)$. In addition, for each such $Π$, we assume that $(Z_n)$ is exchangeable with de Finetti's measure $Π$ and we evaluate the conditional distribution $Π(\cdot\mid Z_1,\ldots,Z_n)$. In Bayesian nonparametrics, if $(Z_1,\ldots, Z_n)$ are the available data, $Π$ and $Π(\cdot\mid Z_1,\ldots, Z_n)$ can be regarded as the prior and the posterior, respectively. To support this interpretation, it suffices to think of a problem where the unknown probability distribution of some bivariate phenomenon is constrained to have marginals $μ$ and $ν$. Finally, analogous results are obtained for the set $Γ(μ)$ of those probability measures on $\mathcal{F}\otimes\mathcal{G}$ with marginal $μ$ on $\mathcal{F}$ (but arbitrary marginal on $\mathcal{G}$). That is, we introduce some priors on $Γ(μ)$ and we evaluate the corresponding posteriors.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Knockoffs for exchangeable categorical covariates
Authors:
Emanuela Dreassi,
Luca Pratelli,
Pietro Rigo
Abstract:
Let $X=(X_1,\ldots,X_p)$ be a $p$-variate random vector and $F$ a fixed finite set. In a number of applications, mainly in genetics, it turns out that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to obtain a knockoff $\widetilde{X}$ (in the sense of \cite{CFJL18}), $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of appli…
▽ More
Let $X=(X_1,\ldots,X_p)$ be a $p$-variate random vector and $F$ a fixed finite set. In a number of applications, mainly in genetics, it turns out that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to obtain a knockoff $\widetilde{X}$ (in the sense of \cite{CFJL18}), $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is exchangeable or partially exchangeable. In fact, when $X_i\in F$ for all $i$, there seem to be various reasons for assuming $X$ exchangeable or partially exchangeable. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $π$ of $X$, is investigated as well. Let $\mathcal{L}_π(\widetilde{X}\mid X=x)$ denote the conditional distribution of $\widetilde{X}$, given $X=x$, when the de Finetti's measure is $π$. It is shown that $$\norm{\mathcal{L}_{π_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{π_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{π_1-π_2}$$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Uncertainty, Imprecise Probabilities and Interval Capacity Measures on a Product Space
Authors:
Marcello Basili,
Luca Pratelli
Abstract:
In Basili and Pratelli (2024), a novel and coherent concept of interval probability measures has been introduced, providing a method for representing imprecise probabilities and uncertainty. Within the framework of set algebra, we introduced the concepts of weak complementation and interval probability measures associated with a family of random variables, which effectively capture the inherent un…
▽ More
In Basili and Pratelli (2024), a novel and coherent concept of interval probability measures has been introduced, providing a method for representing imprecise probabilities and uncertainty. Within the framework of set algebra, we introduced the concepts of weak complementation and interval probability measures associated with a family of random variables, which effectively capture the inherent uncertainty in any event. This paper conducts a comprehensive analysis of these concepts within a specific probability space. Additionally, we elaborate on an updating rule for events, integrating essential concepts of statistical independence, dependence, and stochastic dominance.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Asymptotics of predictive distributions driven by sample means and variances
Authors:
Samuele Garelli,
Fabrizio Leisen,
Luca Pratelli,
Pietro Rigo
Abstract:
Let $α_n(\cdot)=P\bigl(X_{n+1}\in\cdot\mid X_1,\ldots,X_n\bigr)$ be the predictive distributions of a sequence $(X_1,X_2,\ldots)$ of $p$-dimensional random vectors. Suppose $$α_n= \mathcal{N} _p (M_n,Q_n)$$ where $M_n=\frac{1}{n}\sum_{i=1}^nX_i$ and $Q_n=\frac{1}{n}\sum_{i=1}^n(X_i-M_n)(X_i-M_n)^t$. Then, there is a random probability measure $α$ on the Borel subsets of $\mathbb{R}^p$ such that…
▽ More
Let $α_n(\cdot)=P\bigl(X_{n+1}\in\cdot\mid X_1,\ldots,X_n\bigr)$ be the predictive distributions of a sequence $(X_1,X_2,\ldots)$ of $p$-dimensional random vectors. Suppose $$α_n= \mathcal{N} _p (M_n,Q_n)$$ where $M_n=\frac{1}{n}\sum_{i=1}^nX_i$ and $Q_n=\frac{1}{n}\sum_{i=1}^n(X_i-M_n)(X_i-M_n)^t$. Then, there is a random probability measure $α$ on the Borel subsets of $\mathbb{R}^p$ such that $\lVertα_n-α\rVert\overset{a.s.}\longrightarrow 0$ where $\lVert\cdot\rVert$ is total variation distance. An explicit expression for $α$ is provided and the convergence rate of $\lVertα_n-α\rVert$ is shown to be arbitrarily close to $n^{-1/2}$. Moreover, it is still true that $\lVertα_n-α\rVert\overset{a.s.}\longrightarrow 0$ even if $α_n=\mathcal{L}(M_n,Q_n)$ where $\mathcal{L}$ belongs to a class of distributions much larger than the normal. The predictives $α_n$ are useful in various frameworks, including Bayesian predictive inference and predictive resampling. Finally, the asymptotic behavior of copula-based predictive distributions (introduced in [13]) is investigated and a numerical experiment is performed.
△ Less
Submitted 15 September, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
A new approach for imprecise probabilities
Authors:
Marcello Basili,
Luca Pratelli
Abstract:
This paper introduces a novel concept of interval probability measures that enables the representation of imprecise probabilities, or uncertainty, in a natural and coherent manner. Within an algebra of sets, we introduce a notion of weak complementation denoted as $ψ$. The interval probability measure of an event $H$ is defined with respect to the set of indecisive eventualities $(ψ(H))^c$, which…
▽ More
This paper introduces a novel concept of interval probability measures that enables the representation of imprecise probabilities, or uncertainty, in a natural and coherent manner. Within an algebra of sets, we introduce a notion of weak complementation denoted as $ψ$. The interval probability measure of an event $H$ is defined with respect to the set of indecisive eventualities $(ψ(H))^c$, which is included in the standard complement $H^c$.
We characterize a broad class of interval probability measures and define their properties. Additionally, we establish an updating rule with respect to $H$, incorporating concepts of statistical independence and dependence. The interval distribution of a random variable is formulated, and a corresponding definition of stochastic dominance between two random variables is introduced. As a byproduct, a formal solution to the century-old Keynes-Ramsey controversy is presented.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
A probabilistic view on predictive constructions for Bayesian learning
Authors:
Patrizia Berti,
Emanuela Dreassi,
Fabrizio Leisen,
Pietro Rigo,
Luca Pratelli
Abstract:
Given a sequence $X=(X_1,X_2,\ldots)$ of random observations, a Bayesian forecaster aims to predict $X_{n+1}$ based on $(X_1,\ldots,X_n)$ for each $n\ge 0$. To this end, in principle, she only needs to select a collection $σ=(σ_0,σ_1,\ldots)$, called ``strategy" in what follows, where $σ_0(\cdot)=P(X_1\in\cdot)$ is the marginal distribution of $X_1$ and…
▽ More
Given a sequence $X=(X_1,X_2,\ldots)$ of random observations, a Bayesian forecaster aims to predict $X_{n+1}$ based on $(X_1,\ldots,X_n)$ for each $n\ge 0$. To this end, in principle, she only needs to select a collection $σ=(σ_0,σ_1,\ldots)$, called ``strategy" in what follows, where $σ_0(\cdot)=P(X_1\in\cdot)$ is the marginal distribution of $X_1$ and $σ_n(\cdot)=P(X_{n+1}\in\cdot\mid X_1,\ldots,X_n)$ the $n$-th predictive distribution. Because of the Ionescu-Tulcea theorem, $σ$ can be assigned directly, without passing through the usual prior/posterior scheme. One main advantage is that no prior probability is to be selected. In a nutshell, this is the predictive approach to Bayesian learning. A concise review of the latter is provided in this paper. We try to put such an approach in the right framework, to make clear a few misunderstandings, and to provide a unifying view. Some recent results are discussed as well. In addition, some new strategies are introduced and the corresponding distribution of the data sequence $X$ is determined. The strategies concern generalized Pólya urns, random change points, covariates and stationary sequences.
△ Less
Submitted 27 January, 2023; v1 submitted 14 August, 2022;
originally announced August 2022.
-
Bayesian predictive inference without a prior
Authors:
Patrizia Berti,
Emanuela Dreassi,
Fabrizio Leisen,
Pietro Rigo,
Luca Pratelli
Abstract:
Let $(X_n:n\ge 1)$ be a sequence of random observations. Let $σ_n(\cdot)=P\bigl(X_{n+1}\in\cdot\mid X_1,\ldots,X_n\bigr)$ be the $n$-th predictive distribution and $σ_0(\cdot)=P(X_1\in\cdot)$ the marginal distribution of $X_1$. In a Bayesian framework, to make predictions on $(X_n)$, one only needs the collection $σ=(σ_n:n\ge 0)$. Because of the Ionescu-Tulcea theorem, $σ$ can be assigned directly…
▽ More
Let $(X_n:n\ge 1)$ be a sequence of random observations. Let $σ_n(\cdot)=P\bigl(X_{n+1}\in\cdot\mid X_1,\ldots,X_n\bigr)$ be the $n$-th predictive distribution and $σ_0(\cdot)=P(X_1\in\cdot)$ the marginal distribution of $X_1$. In a Bayesian framework, to make predictions on $(X_n)$, one only needs the collection $σ=(σ_n:n\ge 0)$. Because of the Ionescu-Tulcea theorem, $σ$ can be assigned directly, without passing through the usual prior/posterior scheme. One main advantage is that no prior probability has to be selected. In this paper, $σ$ is subjected to two requirements: (i) The resulting sequence $(X_n)$ is conditionally identically distributed, in the sense of Berti, Pratelli and Rigo (2004); (ii) Each $σ_{n+1}$ is a simple recursive update of $σ_n$. Various new $σ$ satisfying (i)-(ii) are introduced and investigated. For such $σ$, the asymptotics of $σ_n$, as $n\rightarrow\infty$, is determined. In some cases, the probability distribution of $(X_n)$ is also evaluated.
△ Less
Submitted 26 April, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
New perspectives on knockoffs construction
Authors:
Patrizia Berti,
Emanuela Dreassi,
Fabrizio Leisen,
Luca Pratelli,
Pietro Rigo
Abstract:
Let $Λ$ be the collection of all probability distributions for $(X,\widetilde{X})$, where $X$ is a fixed random vector and $\widetilde{X}$ ranges over all possible knockoff copies of $X$ (in the sense of \cite{CFJL18}). Three topics are developed in this paper: (i) A new characterization of $Λ$ is proved; (ii) A certain subclass of $Λ$, defined in terms of copulas, is introduced; (iii) The (meanin…
▽ More
Let $Λ$ be the collection of all probability distributions for $(X,\widetilde{X})$, where $X$ is a fixed random vector and $\widetilde{X}$ ranges over all possible knockoff copies of $X$ (in the sense of \cite{CFJL18}). Three topics are developed in this paper: (i) A new characterization of $Λ$ is proved; (ii) A certain subclass of $Λ$, defined in terms of copulas, is introduced; (iii) The (meaningful) special case where the components of $X$ are conditionally independent is treated in depth. In real problems, after observing $X=x$, each of points (i)-(ii)-(iii) may be useful to generate a value $\widetilde{x}$ for $\widetilde{X}$ conditionally on $X=x$.
△ Less
Submitted 29 July, 2022; v1 submitted 15 April, 2021;
originally announced April 2021.
-
A note on a universal random variate generator for integer-valued random variables
Authors:
Lucio Barabesi,
Luca Pratelli
Abstract:
A universal generator for integer-valued square-integrable random variables is introduced. The generator relies on a rejection technique based on a generalization of the inversion formula for integer-valued random variables. The proposal gives rise to a simple algorithm which may be implemented in a few code lines and which may show good performance when the classical families of distributions - s…
▽ More
A universal generator for integer-valued square-integrable random variables is introduced. The generator relies on a rejection technique based on a generalization of the inversion formula for integer-valued random variables. The proposal gives rise to a simple algorithm which may be implemented in a few code lines and which may show good performance when the classical families of distributions - such as the Poisson and the Binomial - are considered. In addition, the method is suitable for the computer generation of integer-valued random variables which display closed-form characteristic functions, but do not possess a probability function expressible in a simple analytical way. As an example of such a framework, an application to the Poisson-Tweedie distribution is provided.
△ Less
Submitted 5 November, 2012;
originally announced November 2012.
-
Statistical inference on the h-index with an application to top-scientist performance
Authors:
Alberto Baccini,
Lucio Barabesi,
Marzia Marcheselli,
Luca Pratelli
Abstract:
Despite the huge amount of literature on h-index, few papers have been devoted to the statistical analysis of h-index when a probabilistic distribution is assumed for citation counts. The present contribution relies on showing the available inferential techniques, by providing the details for proper point and set estimation of the theoretical h-index. Moreover, some issues on simultaneous inferenc…
▽ More
Despite the huge amount of literature on h-index, few papers have been devoted to the statistical analysis of h-index when a probabilistic distribution is assumed for citation counts. The present contribution relies on showing the available inferential techniques, by providing the details for proper point and set estimation of the theoretical h-index. Moreover, some issues on simultaneous inference - aimed to produce suitable scholar comparisons - are carried out. Finally, the analysis of the citation dataset for the Nobel Laureates (in the last five years) and for the Fields medallists (from 2002 onward) is proposed.
△ Less
Submitted 20 May, 2012;
originally announced May 2012.
-
Comment: Gibbs Sampling, Exponential Families and Orthogonal Polynomials
Authors:
Patrizia Berti,
Guido Consonni,
Luca Pratelli,
Pietro Rigo
Abstract:
Comment on ``Gibbs Sampling, Exponential Families and Orthogonal Polynomials'' [arXiv:0808.3852]
Comment on ``Gibbs Sampling, Exponential Families and Orthogonal Polynomials'' [arXiv:0808.3852]
△ Less
Submitted 28 August, 2008;
originally announced August 2008.