Search | arXiv e-print repository

The complex behaviour of Galton rank order statistic

Authors: E. del Barrio, J. A. Cuesta-Albertos, C. Matran

Abstract: Galton's rank order statistic is one of the oldest statistical tools for two-sample comparisons. It is also a very natural index to measure departures from stochastic dominance. Yet, its asymptotic behaviour has been investigated only partially, under restrictive assumptions. This work provides a comprehensive {study} of this behaviour, based on the analysis of the so-called contact set (a modific… ▽ More Galton's rank order statistic is one of the oldest statistical tools for two-sample comparisons. It is also a very natural index to measure departures from stochastic dominance. Yet, its asymptotic behaviour has been investigated only partially, under restrictive assumptions. This work provides a comprehensive {study} of this behaviour, based on the analysis of the so-called contact set (a modification of the set in which the quantile functions coincide). We show that a.s. convergence to the population counterpart holds if and only if {the} contact set has zero Lebesgue measure. When this set is finite we show that the asymptotic behaviour is determined by the local behaviour of a suitable reparameterization of the quantile functions in a neighbourhood of the contact points. Regular crossings result in standard rates and Gaussian limiting distributions, but higher order contacts (in the sense introduced in this work) or contacts at the extremes of the supports may result in different rates and non-Gaussian limits. △ Less

Submitted 4 February, 2021; originally announced February 2021.

Comments: 35 pages. No figures

MSC Class: 60E15

arXiv:2008.09897 [pdf, other]

On a projection-based class of uniformity tests on the hypersphere

Authors: Eduardo García-Portugués, Paula Navarro-Esteban, Juan A. Cuesta-Albertos

Abstract: We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractab… ▽ More We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractable forms for higher dimensions. Despite its different origin, the proposed class is shown to be related with the well-studied Sobolev class of uniformity tests. Our new class proves itself advantageous by allowing to derive new tests for hyperspherical data that neatly extend the circular tests by Watson, Ajne, and Rothman, and by introducing the first instance of an Anderson-Darling-like test for such data. The asymptotic distributions and the local optimality against certain alternatives of the new tests are obtained. A simulation study evaluates the theoretical findings and evidences that, for certain scenarios, the new tests are competitive against previous proposals. The new tests are employed in three astronomical applications. △ Less

Submitted 21 September, 2020; v1 submitted 22 August, 2020; originally announced August 2020.

Comments: 26 pages, 3 figures, 6 tables. Supplementary material: 26 pages, 2 figures, 4 tables

MSC Class: 62H11; 62H15

arXiv:1602.04941 [pdf, other]

On Perfect Classification and Clustering for Gaussian Processes

Authors: Juan A. Cuesta-Albertos, Subhajit Dutta

Abstract: In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian processes. The misclassification probability of any simple classifier when applied on the transformed data asymptotically converges to zero. In a clustering problem… ▽ More In this paper, we propose a data based transformation for infinite-dimensional Gaussian processes and derive its limit theorem. For a classification problem, this transformation induces complete separation among the associated Gaussian processes. The misclassification probability of any simple classifier when applied on the transformed data asymptotically converges to zero. In a clustering problem using mixture models, an appropriate modification of this transformation asymptotically leads to perfect separation of the populations. Theoretical properties are studied for the usual $k$-means clustering method when used on this transformed data. Good empirical performance of the proposed methodology is demonstrated using simulated as well as benchmark data sets, when compared with some popular parametric and nonparametric methods for such functional data. △ Less

Submitted 23 March, 2022; v1 submitted 16 February, 2016; originally announced February 2016.

Comments: 54 pages, 2 figures

MSC Class: 62H30; 60G15

arXiv:1511.05355 [pdf, other]

A fixed-point approach to barycenters in Wasserstein space

Authors: Pedro C. Álvarez-Esteban, E. del Barrio, J. A. Cuesta-Albertos, C. Matrán

Abstract: Let $\mathcal{P}_{2,ac}$ be the set of Borel probabilities on $\mathbb{R}^d$ with finite second moment and absolutely continuous with respect to Lebesgue measure. We consider the problem of finding the barycenter (or Fréchet mean) of a finite set of probabilities $ν_1,\ldots,ν_k \in \mathcal{P}_{2,ac}$ with respect to the $L_2-$Wasserstein metric. For this task we introduce an operator on… ▽ More Let $\mathcal{P}_{2,ac}$ be the set of Borel probabilities on $\mathbb{R}^d$ with finite second moment and absolutely continuous with respect to Lebesgue measure. We consider the problem of finding the barycenter (or Fréchet mean) of a finite set of probabilities $ν_1,\ldots,ν_k \in \mathcal{P}_{2,ac}$ with respect to the $L_2-$Wasserstein metric. For this task we introduce an operator on $\mathcal{P}_{2,ac}$ related to the optimal transport maps pushing forward any $μ\in \mathcal{P}_{2,ac}$ to $ν_1,\ldots,ν_k$. Under very general conditions we prove that the barycenter must be a fixed point for this operator and introduce an iterative procedure which consistently approximates the barycenter. The procedure allows effective computation of barycenters in any location-scatter family, including the Gaussian case. In such cases the barycenter must belong to the family, thus it is characterized by its mean and covariance matrix. While its mean is just the weighted mean of the means of the probabilities, the covariance matrix is characterized in terms of their covariance matrices $Σ_1,\dots,Σ_k$ through a nonlinear matrix equation. The performance of the iterative procedure in this case is illustrated through numerical simulations, which show fast convergence towards the barycenter. △ Less

Submitted 22 April, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

Comments: 18 pages, 2 figures

MSC Class: 60B05 (Primary); 47H10; 47J25; 65D99 (Secondary)

arXiv:1205.1950 [pdf, ps, other]

doi 10.3150/11-BEJ351

Similarity of samples and trimming

Authors: Pedro C. Álvarez-Esteban, Eustasio del Barrio, Juan A. Cuesta-Albertos, Carlos Matrán

Abstract: We say that two probabilities are similar at level $α$ if they are contaminated versions (up to an $α$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer th… ▽ More We say that two probabilities are similar at level $α$ if they are contaminated versions (up to an $α$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples. △ Less

Submitted 9 May, 2012; originally announced May 2012.

Comments: Published in at http://dx.doi.org/10.3150/11-BEJ351 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ351

Journal ref: Bernoulli 2012, Vol. 18, No. 2, 606-634

arXiv:1004.5031 [pdf, ps, other]

Supervised classification for a family of Gaussian functional models

Authors: Amparo Baíllo, Juan Antonio Cuesta-Albertos, Antonio Cuevas

Abstract: In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems… ▽ More In the framework of supervised classification (discrimination) for functional data, it is shown that the optimal classification rule can be explicitly obtained for a class of Gaussian processes with "triangular" covariance functions. This explicit knowledge has two practical consequences. First, the consistency of the well-known nearest neighbors classifier (which is not guaranteed in the problems with functional data) is established for the indicated class of processes. Second, and more important, parametric and nonparametric plug-in classifiers can be obtained by estimating the unknown elements in the optimal rule. The performance of these new plug-in classifiers is checked, with positive results, through a simulation study and a real data example. △ Less

Submitted 28 April, 2010; originally announced April 2010.

Comments: 30 pages, 6 figures, 2 tables

MSC Class: 60G15; 60G35; 62G05

arXiv:0911.3520 [pdf, other]

A random-projection based procedure to test if a stationary process is Gaussian

Authors: Juan . A. Cuesta-Albertos, Fabrice Gamboa Alicia Nieto-Reyes

Abstract: In this paper we address the statistical problem of testing if a stationary process is Gaussian. The observation consists in a finite sample path of the process. Using a random projection technique introduced and studied in Cuesta-Albertos et al. 2007 in the frame of goodness of fit test for functional data, we perform some decision rules. These rules really stand on the whole distribution of th… ▽ More In this paper we address the statistical problem of testing if a stationary process is Gaussian. The observation consists in a finite sample path of the process. Using a random projection technique introduced and studied in Cuesta-Albertos et al. 2007 in the frame of goodness of fit test for functional data, we perform some decision rules. These rules really stand on the whole distribution of the process and not only on its marginal distribution at a fixed order. The main idea is to test the Gaussianity on the marginal distribution of some random linear combinations of the process. This leads to consistent decision rules. Some numerical simulations show the pertinence of our approach. △ Less

Submitted 18 November, 2009; originally announced November 2009.

Comments: 31 pages, 1 figure

arXiv:0811.0503 [pdf, ps, other]

doi 10.1214/07-AOS541

Trimming and likelihood: Robust location and dispersion estimation in the elliptical model

Authors: Juan A. Cuesta-Albertos, Carlos Matrán, Agustín Mayo-Iscar

Abstract: Robust estimators of location and dispersion are often used in the elliptical model to obtain an uncontaminated and highly representative subsample by trimming the data outside an ellipsoid based in the associated Mahalanobis distance. Here we analyze some one (or $k$)-step Maximum Likelihood Estimators computed on a subsample obtained with such a procedure. We introduce different models which a… ▽ More Robust estimators of location and dispersion are often used in the elliptical model to obtain an uncontaminated and highly representative subsample by trimming the data outside an ellipsoid based in the associated Mahalanobis distance. Here we analyze some one (or $k$)-step Maximum Likelihood Estimators computed on a subsample obtained with such a procedure. We introduce different models which arise naturally from the ways in which the discarded data can be treated, leading to truncated or censored likelihoods, as well as to a likelihood based on an only outliers gross errors model. Results on existence, uniqueness, robustness and asymptotic properties of the proposed estimators are included. A remarkable fact is that the proposed estimators generally keep the breakdown point of the initial (robust) estimators, but they could improve the rate of convergence of the initial estimator because our estimators always converge at rate $n^{1/2}$, independently of the rate of convergence of the initial estimator. △ Less

Submitted 4 November, 2008; originally announced November 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-AOS541 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS541 MSC Class: 62F35 (Primary) 62F10; 62F12 (Secondary)

Journal ref: Annals of Statistics 2008, Vol. 36, No. 5, 2284-2318

Showing 1–8 of 8 results for author: Cuesta-Albertos, J A