-
Robust Model-Based Clustering
Authors:
Juan D. Gonzalez,
Ricardo Maronna,
Victor J. Yohai,
Ruben H. Zamar
Abstract:
We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a procedure that uses S estimators of multivariate location and scatter. We develop an algorithm to compute the estimators and to build the clusters which is…
▽ More
We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a procedure that uses S estimators of multivariate location and scatter. We develop an algorithm to compute the estimators and to build the clusters which is quite similar to the EM algorithm. An extensive Monte Carlo simulation study shows that our proposal compares favorably with other robust and non robust model-based clustering procedures. We apply ours and alternative procedures to a real data set and again find that the best results are obtained using our proposal.
△ Less
Submitted 8 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Robust multivariate methods in Chemometrics
Authors:
Peter Filzmoser,
Sven Serneels,
Ricardo Maronna,
Christophe Croux
Abstract:
This chapter presents an introduction to robust statistics with applications of a chemometric nature. Following a description of the basic ideas and concepts behind robust statistics, including how robust estimators can be conceived, the chapter builds up to the construction (and use) of robust alternatives for some methods for multivariate analysis frequently used in chemometrics, such as princip…
▽ More
This chapter presents an introduction to robust statistics with applications of a chemometric nature. Following a description of the basic ideas and concepts behind robust statistics, including how robust estimators can be conceived, the chapter builds up to the construction (and use) of robust alternatives for some methods for multivariate analysis frequently used in chemometrics, such as principal component analysis and partial least squares. The chapter then provides an insight into how these robust methods can be used or extended to classification. To conclude, the issue of validation of the results is being addressed: it is shown how uncertainty statements associated with robust estimates, can be obtained.
△ Less
Submitted 15 June, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Robust principal components for irregularly spaced longitudinal data
Authors:
Ricardo A. Maronna
Abstract:
Consider longitudinal data $x_{ij},$ with $i=1,...,n$ and $j=1,...,p_{i},$ where $x_{ij}$ is the $j-$th observation of the random function $X_{i}\left( .\right) $ observed at time $t_{j}.$ The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of $q$ smooth functions $H_{k}\left( .\right) $ ($k=1,..,q)$ in the sense that…
▽ More
Consider longitudinal data $x_{ij},$ with $i=1,...,n$ and $j=1,...,p_{i},$ where $x_{ij}$ is the $j-$th observation of the random function $X_{i}\left( .\right) $ observed at time $t_{j}.$ The goal of this paper is to develop a parsimonious representation of the data by a linear combination of a set of $q$ smooth functions $H_{k}\left( .\right) $ ($k=1,..,q)$ in the sense that $x_{ij}\approxμ_{j}+\sum_{k=1}^{q}β_{ki}H_{k}\left( t_{j}\right) ,$ such that it fulfills three goals: it is resistant to atypical $X_{i}$'s ('case contamination'), it is resistant to isolated gross errors at some $t_{ij}$ ('cell contamination'), and it can be applied when some of the $x_{ij}$ are missing ('irregularly spaced' ---or 'incomplete'-- data).
Two approaches will be proposed for this problem. One deals with the three goals stated above, and is based on ideas similar to MM-estimation (Yohai 1987). The other is a simple and fast estimator which can be applied to complete data with case- and cellwise contamination, and is based on applying a standard robust principal components estimate and smoothing the principal directions. Experiments with real and simulated data suggest that with complete data the simple estimator outperforms its competitors, while the MM estimator is competitive for incomplete data.
Keywords: Principal components, MM-estimator, longitudinal .data, B-splines, incomplete data.
△ Less
Submitted 26 March, 2018;
originally announced March 2018.
-
Improving the Peña-Prieto "KSD" procedure
Authors:
Ricardo Maronna
Abstract:
Peña and Prieto (2007) proposed the "Kurtosis plus specific directions" (KSD) method for robust multivariate location and scatter estimation and outlier detection. Maronna and Yohai (2017) employed it as an initial estimator for multivariate S- and MM-estimators, and their simulations showed that KSD generally outperforms initial estimators based on subsampling. However further simulations show th…
▽ More
Peña and Prieto (2007) proposed the "Kurtosis plus specific directions" (KSD) method for robust multivariate location and scatter estimation and outlier detection. Maronna and Yohai (2017) employed it as an initial estimator for multivariate S- and MM-estimators, and their simulations showed that KSD generally outperforms initial estimators based on subsampling. However further simulations show that KSD may become unstable and give wrong results in extreme situations when the contamination rate is "high" (>=0.2) and the ratio n/p of cases to variables is "low" (<10). Two simple modifications of the procedure are proposed, which greatly improve on the method's performance as an initial estimator, with only a small increase in computational time.
△ Less
Submitted 10 August, 2017;
originally announced August 2017.