-
The complex behaviour of Galton rank order statistic
Authors:
E. del Barrio,
J. A. Cuesta-Albertos,
C. Matran
Abstract:
Galton's rank order statistic is one of the oldest statistical tools for two-sample comparisons. It is also a very natural index to measure departures from stochastic dominance. Yet, its asymptotic behaviour has been investigated only partially, under restrictive assumptions. This work provides a comprehensive {study} of this behaviour, based on the analysis of the so-called contact set (a modific…
▽ More
Galton's rank order statistic is one of the oldest statistical tools for two-sample comparisons. It is also a very natural index to measure departures from stochastic dominance. Yet, its asymptotic behaviour has been investigated only partially, under restrictive assumptions. This work provides a comprehensive {study} of this behaviour, based on the analysis of the so-called contact set (a modification of the set in which the quantile functions coincide). We show that a.s. convergence to the population counterpart holds if and only if {the} contact set has zero Lebesgue measure. When this set is finite we show that the asymptotic behaviour is determined by the local behaviour of a suitable reparameterization of the quantile functions in a neighbourhood of the contact points. Regular crossings result in standard rates and Gaussian limiting distributions, but higher order contacts (in the sense introduced in this work) or contacts at the extremes of the supports may result in different rates and non-Gaussian limits.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
On a projection-based class of uniformity tests on the hypersphere
Authors:
Eduardo García-Portugués,
Paula Navarro-Esteban,
Juan A. Cuesta-Albertos
Abstract:
We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractab…
▽ More
We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractable forms for higher dimensions. Despite its different origin, the proposed class is shown to be related with the well-studied Sobolev class of uniformity tests. Our new class proves itself advantageous by allowing to derive new tests for hyperspherical data that neatly extend the circular tests by Watson, Ajne, and Rothman, and by introducing the first instance of an Anderson-Darling-like test for such data. The asymptotic distributions and the local optimality against certain alternatives of the new tests are obtained. A simulation study evaluates the theoretical findings and evidences that, for certain scenarios, the new tests are competitive against previous proposals. The new tests are employed in three astronomical applications.
△ Less
Submitted 21 September, 2020; v1 submitted 22 August, 2020;
originally announced August 2020.
-
On Perfect Classification and Clustering for Gaussian Processes
Authors:
Juan A. Cuesta-Albertos,
Subhajit Dutta
Abstract:
In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian processes. The misclassification probability of any simple classifier when applied on the transformed data asymptotically converges to zero. In a clustering problem…
▽ More
In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian processes. The misclassification probability of any simple classifier when applied on the transformed data asymptotically converges to zero. In a clustering problem using mixture models, an appropriate modification of this transformation asymptotically leads to perfect separation of the populations. Theoretical properties are studied for the usual $k$-means clustering method when used on this transformed data. Good empirical performance of the proposed methodology is demonstrated using simulated as well as benchmark data sets, when compared with some popular parametric and nonparametric methods for such functional data.
△ Less
Submitted 23 March, 2022; v1 submitted 16 February, 2016;
originally announced February 2016.
-
A fixed-point approach to barycenters in Wasserstein space
Authors:
Pedro C. Álvarez-Esteban,
E. del Barrio,
J. A. Cuesta-Albertos,
C. Matrán
Abstract:
Let $\mathcal{P}_{2,ac}$ be the set of Borel probabilities on $\mathbb{R}^d$ with finite second moment and absolutely continuous with respect to Lebesgue measure. We consider the problem of finding the barycenter (or Fréchet mean) of a finite set of probabilities $ν_1,\ldots,ν_k \in \mathcal{P}_{2,ac}$ with respect to the $L_2-$Wasserstein metric. For this task we introduce an operator on…
▽ More
Let $\mathcal{P}_{2,ac}$ be the set of Borel probabilities on $\mathbb{R}^d$ with finite second moment and absolutely continuous with respect to Lebesgue measure. We consider the problem of finding the barycenter (or Fréchet mean) of a finite set of probabilities $ν_1,\ldots,ν_k \in \mathcal{P}_{2,ac}$ with respect to the $L_2-$Wasserstein metric. For this task we introduce an operator on $\mathcal{P}_{2,ac}$ related to the optimal transport maps pushing forward any $μ\in \mathcal{P}_{2,ac}$ to $ν_1,\ldots,ν_k$. Under very general conditions we prove that the barycenter must be a fixed point for this operator and introduce an iterative procedure which consistently approximates the barycenter. The procedure allows effective computation of barycenters in any location-scatter family, including the Gaussian case. In such cases the barycenter must belong to the family, thus it is characterized by its mean and covariance matrix. While its mean is just the weighted mean of the means of the probabilities, the covariance matrix is characterized in terms of their covariance matrices $Σ_1,\dots,Σ_k$ through a nonlinear matrix equation. The performance of the iterative procedure in this case is illustrated through numerical simulations, which show fast convergence towards the barycenter.
△ Less
Submitted 22 April, 2016; v1 submitted 17 November, 2015;
originally announced November 2015.
-
Similarity of samples and trimming
Authors:
Pedro C. Álvarez-Esteban,
Eustasio del Barrio,
Juan A. Cuesta-Albertos,
Carlos Matrán
Abstract:
We say that two probabilities are similar at level $α$ if they are contaminated versions (up to an $α$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer th…
▽ More
We say that two probabilities are similar at level $α$ if they are contaminated versions (up to an $α$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
Supervised classification for a family of Gaussian functional models
Authors:
Amparo Baíllo,
Juan Antonio Cuesta-Albertos,
Antonio Cuevas
Abstract:
In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems…
▽ More
In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems with functional data) is established for the indicated class of processes. Second, and more important, parametric and nonparametric plug-in classifiers can be obtained by estimating the unknown elements in the optimal rule. The performance of these new plug-in classifiers is checked, with positive results, through a simulation study and a real data example.
△ Less
Submitted 28 April, 2010;
originally announced April 2010.
-
A random-projection based procedure to test if a stationary process is Gaussian
Authors:
Juan . A. Cuesta-Albertos,
Fabrice Gamboa Alicia Nieto-Reyes
Abstract:
In this paper we address the statistical problem of testing if a stationary process is Gaussian. The observation consists in a finite sample path of the process. Using a random projection technique introduced and studied in Cuesta-Albertos et al. 2007 in the frame of goodness of fit test for functional data, we perform some decision rules. These rules really stand on the whole distribution of th…
▽ More
In this paper we address the statistical problem of testing if a stationary process is Gaussian. The observation consists in a finite sample path of the process. Using a random projection technique introduced and studied in Cuesta-Albertos et al. 2007 in the frame of goodness of fit test for functional data, we perform some decision rules. These rules really stand on the whole distribution of the process and not only on its marginal distribution at a fixed order. The main idea is to test the Gaussianity on the marginal distribution of some random linear combinations of the process. This leads to consistent decision rules. Some numerical simulations show the pertinence of our approach.
△ Less
Submitted 18 November, 2009;
originally announced November 2009.
-
Trimming and likelihood: Robust location and dispersion estimation in the elliptical model
Authors:
Juan A. Cuesta-Albertos,
Carlos Matrán,
Agustín Mayo-Iscar
Abstract:
Robust estimators of location and dispersion are often used in the elliptical model to obtain an uncontaminated and highly representative subsample by trimming the data outside an ellipsoid based in the associated Mahalanobis distance. Here we analyze some one (or $k$)-step Maximum Likelihood Estimators computed on a subsample obtained with such a procedure. We introduce different models which a…
▽ More
Robust estimators of location and dispersion are often used in the elliptical model to obtain an uncontaminated and highly representative subsample by trimming the data outside an ellipsoid based in the associated Mahalanobis distance. Here we analyze some one (or $k$)-step Maximum Likelihood Estimators computed on a subsample obtained with such a procedure. We introduce different models which arise naturally from the ways in which the discarded data can be treated, leading to truncated or censored likelihoods, as well as to a likelihood based on an only outliers gross errors model. Results on existence, uniqueness, robustness and asymptotic properties of the proposed estimators are included. A remarkable fact is that the proposed estimators generally keep the breakdown point of the initial (robust) estimators, but they could improve the rate of convergence of the initial estimator because our estimators always converge at rate $n^{1/2}$, independently of the rate of convergence of the initial estimator.
△ Less
Submitted 4 November, 2008;
originally announced November 2008.