-
Robust Model-Based Clustering
Abstract: We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a procedure that uses S estimators of multivariate location and scatter. We develop an algorithm to compute the estimators and to build the clusters which is… ▽ More
Submitted 8 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.
-
Multivariate Location and Scatter Matrix Estimation Under Cellwise and Casewise Contamination
Abstract: We consider the problem of multivariate location and scatter matrix estimation when the data contain cellwise and casewise outliers. Agostinelli et al. (2015) propose a two-step approach to deal with this problem: first, apply a univariate filter to remove cellwise outliers and second, apply a generalized S-estimator to downweight casewise outliers. We improve this proposal in three main direction… ▽ More
Submitted 25 December, 2016; v1 submitted 1 September, 2016; originally announced September 2016.
MSC Class: 62G35; 62G05; 62G20
-
Robust regression estimation and inference in the presence of cellwise and casewise contamination
Abstract: Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The proposed method, called three-step regression, proceeds as follows: first, it uses a consistent univariate filter to detect and eliminate extreme cellwise outliers; se… ▽ More
Submitted 25 December, 2016; v1 submitted 8 September, 2015; originally announced September 2015.
MSC Class: 62G35; 62G05; 62G20
-
Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination
Abstract: Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of variables and a relatively small number of cases are common place in modern statistical applications. In these cases global down-weighting of an entire case, as perfor… ▽ More
Submitted 23 June, 2014; originally announced June 2014.
MSC Class: 62G35 (Primary); 62G05 (Secondary)
-
arXiv:0903.0447 [pdf, ps, other]
Propagation of outliers in multivariate data
Abstract: We investigate the performance of robust estimates of multivariate location under nonstandard data contamination models such as componentwise outliers (i.e., contamination in each variable is independent from the other variables). This model brings up a possible new source of statistical error that we call "propagation of outliers." This source of error is unusual in the sense that it is generat… ▽ More
Submitted 3 March, 2009; originally announced March 2009.
Comments: Published in at http://dx.doi.org/10.1214/07-AOS588 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS588 MSC Class: 62F35 (Primary) 62H12 (Secondary)
Journal ref: Annals of Statistics 2009, Vol. 37, No. 1, 311-331
-
arXiv:math/0702641 [pdf, ps, other]
Discussion: Conditional growth charts
Abstract: Discussion of Conditional growth charts [math.ST/0702634]
Submitted 22 February, 2007; originally announced February 2007.
Comments: Published at http://dx.doi.org/10.1214/009053606000000669 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS0102D
Journal ref: Annals of Statistics 2006, Vol. 34, No. 5, 2113-2118
-
arXiv:math/0503665 [pdf, ps, other]
Robust nonparametric inference for the median
Abstract: We consider the problem of constructing robust nonparametric confidence intervals and tests of hypothesis for the median when the data distribution is unknown and the data may contain a small fraction of contamination. We propose a modification of the sign test (and its associated confidence interval) which attains the nominal significance level (probability coverage) for any distribution in the… ▽ More
Submitted 29 March, 2005; originally announced March 2005.
Comments: Published at http://dx.doi.org/10.1214/009053604000000634 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
Report number: IMS-AOS-AOS283 MSC Class: 62F35 (Primary) 62G35 (Secondary)
Journal ref: Annals of Statistics 2004, Vol. 32, No. 5, 1841-1857
-
arXiv:math/0410079 [pdf, ps, other]
Uniform asymptotics for robust location estimates when the scale is unknown
Abstract: Most asymptotic results for robust estimates rely on regularity conditions that are difficult to verify in practice. Moreover, these results apply to fixed distribution functions. In the robustness context the distribution of the data remains largely unspecified and hence results that hold uniformly over a set of possible distribution functions are of theoretical and practical interest. Also, it… ▽ More
Submitted 5 October, 2004; originally announced October 2004.
Comments: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/009053604000000544
Report number: IMS-AOS-AOS235 MSC Class: 62F35; 62F12; 62E20. (Primary)
Journal ref: Annals of Statistics 2004, Vol. 32, No. 4, 1434-1447