-
Vector-Valued Gaussian Processes and their Kernels on a Class of Metric Graphs
Authors:
Tobia Filosi,
Emilio Porcu,
Xavier Emery,
Claudio Agostinelli,
Alfredo Alegrìa
Abstract:
Despite the increasing importance of stochastic processes on linear networks and graphs, current literature on multivariate (vector-valued) Gaussian random fields on metric graphs is elusive. This paper challenges several aspects related to the construction of proper matrix-valued kernels structures. We start by considering matrix-valued metrics that can be composed with scalar- or matrix-valued f…
▽ More
Despite the increasing importance of stochastic processes on linear networks and graphs, current literature on multivariate (vector-valued) Gaussian random fields on metric graphs is elusive. This paper challenges several aspects related to the construction of proper matrix-valued kernels structures. We start by considering matrix-valued metrics that can be composed with scalar- or matrix-valued functions to implement valid kernels associated with vector-valued Gaussian fields. We then provide conditions for certain classes of matrix-valued functions to be composed with the univariate resistance metric and ensure positive semidefiniteness. Special attention is then devoted to Euclidean trees, where a substantial effort is required given the absence of literature related to multivariate kernels depending on the $\ell_1$ metric. Hence, we provide a foundational contribution to certain classes of matrix-valued positive semidefinite functions depending on the $\ell_1$ metric. This fact is then used to characterise kernels on Euclidean trees with a finite number of leaves. Amongst those, we provide classes of matrix-valued covariance functions that are compactly supported.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
Robust Elastic Net Estimators for High Dimensional Generalized Linear Models
Authors:
Marina Valdora,
Claudio Agostinelli
Abstract:
Robust estimators for Generalized Linear Models (GLMs) are not easy to develop because of the nature of the distributions involved. Recently, there has been an increasing interest in this topic, especially in the presence of a possibly large number of explanatory variables. Transformed M-estimators (MT) are a natural way to extend the methodology of M-estimators to the class of GLMs and to obtain…
▽ More
Robust estimators for Generalized Linear Models (GLMs) are not easy to develop because of the nature of the distributions involved. Recently, there has been an increasing interest in this topic, especially in the presence of a possibly large number of explanatory variables. Transformed M-estimators (MT) are a natural way to extend the methodology of M-estimators to the class of GLMs and to obtain robust methods. We introduce a penalized version of MT-estimators in order to deal with high-dimensional data. We prove, under appropriate assumptions, consistency and asymptotic normality of this new class of estimators. The theory is developed for redescending $ρ$-functions and Elastic Net penalization. An iterative re-weighted least squares algorithm is given, together with a procedure to initialize it. The latter is of particular importance, since the estimating equations might have multiple roots. We illustrate the performance of this new method for the Poisson family under several type of contaminations in a Monte Carlo experiment and in an example based on a real dataset.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Robust Estimation for Multivariate Wrapped Models
Authors:
Giovanni Saraceno,
Claudio Agostinelli,
Luca Greco
Abstract:
A weighted likelihood technique for robust estimation of a multivariate Wrapped Normal distribution for data points scattered on a p-dimensional torus is proposed. The occurrence of outliers in the sample at hand can badly compromise inference for standard techniques such as maximum likelihood method. Therefore, there is the need to handle such model inadequacies in the fitting process by a robust…
▽ More
A weighted likelihood technique for robust estimation of a multivariate Wrapped Normal distribution for data points scattered on a p-dimensional torus is proposed. The occurrence of outliers in the sample at hand can badly compromise inference for standard techniques such as maximum likelihood method. Therefore, there is the need to handle such model inadequacies in the fitting process by a robust technique and an effective down-weighting of observations not following the assumed model. Furthermore, the employ of a robust method could help in situations of hidden and unexpected substructures in the data. Here, it is suggested to build a set of data-dependent weights based on the Pearson residuals and solve the corresponding weighted likelihood estimating equations. In particular, robust estimation is carried out by using a Classification EM algorithm whose M-step is enhanced by the computation of weights based on current parameters' values. The finite sample behavior of the proposed method has been investigated by a Monte Carlo numerical studies and real data examples.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Analytical and statistical properties of local depth functions motivated by clustering applications
Authors:
Giacomo Francisci,
Claudio Agostinelli,
Alicia Nieto-Reyes,
Anand N. Vidyashankar
Abstract:
Local general depth ($LGD$) functions are used for describing the local geometric features and mode(s) in multivariate distributions. In this paper, we undertake a rigorous systematic study of $LGD$ and establish several analytical and statistical properties. First, we show that, when the underlying probability distribution is absolutely continuous with density $f(\cdot)$, the scaled version of…
▽ More
Local general depth ($LGD$) functions are used for describing the local geometric features and mode(s) in multivariate distributions. In this paper, we undertake a rigorous systematic study of $LGD$ and establish several analytical and statistical properties. First, we show that, when the underlying probability distribution is absolutely continuous with density $f(\cdot)$, the scaled version of $LGD$ (referred to as $τ$-approximation) converges, uniformly and in $L^d(\mathbb{R}^p)$ to $f(\cdot)$ when $τ$ converges to zero. Second, we establish that, as the sample size diverges to infinity the centered and scaled sample $LGD$ converge in distribution to a centered Gaussian process uniformly in the space of bounded functions on $\mathcal{H}_G$, a class of functions yielding $LGD$. Third, using the sample version of the $τ$-approximation ($S τA$) and the gradient system analysis, we develop a new clustering algorithm. The validity of this algorithm requires several results concerning the uniform finite difference approximation of the gradient system associated with $S τA$. For this reason, we establish \emph{Bernstein}-type inequality for deviations between the centered and scaled sample $LGD$, which is also of independent interest. Finally, invoking the above results, we establish consistency of the clustering algorithm. Applications of the proposed methods to mode estimation and upper level set estimation are also provided. Finite sample performance of the methodology are evaluated using numerical experiments and data analysis.
△ Less
Submitted 3 November, 2022; v1 submitted 27 August, 2020;
originally announced August 2020.
-
Robust Multivariate Estimation Based On Statistical Depth Filters
Authors:
Giovanni Saraceno,
Claudio Agostinelli
Abstract:
In the classical contamination models, such as the gross-error (Huber and Tukey contamination model or Case-wise Contamination), observations are considered as the units to be identified as outliers or not. This model is very useful when the number of considered variables is moderately small. Alqallaf et al. [2009] shows the limits of this approach for a larger number of variables and introduced t…
▽ More
In the classical contamination models, such as the gross-error (Huber and Tukey contamination model or Case-wise Contamination), observations are considered as the units to be identified as outliers or not. This model is very useful when the number of considered variables is moderately small. Alqallaf et al. [2009] shows the limits of this approach for a larger number of variables and introduced the Independent contamination model (Cell-wise Contamination) where now the cells are the units to be identified as outliers or not. One approach to deal, at the same time, with both type of contamination is filter out the contaminated cells from the data set and then apply a robust procedure able to handle case-wise outliers and missing values. Here we develop a general framework to build filters in any dimension based on statistical data depth functions. We show that previous approaches, e.g. Agostinelli et al. [2015a] and Leung et al. [2017], are special cases. We illustrate our method by using the half-space depth.
△ Less
Submitted 16 January, 2021; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Generalization of the simplicial depth: no vanishment outside the convex hull of the distribution support
Authors:
Giacomo Francisci,
Alicia Nieto-Reyes,
Claudio Agostinelli
Abstract:
The simplicial depth, like other relevant multivariate statistical data depth functions, vanishes right outside the convex hull of the support of the distribution with respect to which the depth is computed. This is problematic when it is required to differentiate among points outside the convex hull of the distribution support, with respect to which the depth is computed, based on their depth val…
▽ More
The simplicial depth, like other relevant multivariate statistical data depth functions, vanishes right outside the convex hull of the support of the distribution with respect to which the depth is computed. This is problematic when it is required to differentiate among points outside the convex hull of the distribution support, with respect to which the depth is computed, based on their depth values. We provide the first proposal for simplicial depth which do not vanish right outside the convex hull of the distribution. The properties of the proposal and of the corresponding estimator are studied theoretically and by means of Monte Carlo simulations and analysis of datasets.
△ Less
Submitted 11 January, 2024; v1 submitted 6 September, 2019;
originally announced September 2019.
-
Composite Robust Estimators for Linear Mixed Models
Authors:
Claudio Agostinelli,
Victor J. Yohai
Abstract:
The Classical Tukey-Huber Contamination Model (CCM) is a usual framework to describe the mechanism of outliers generation in robust statistics. In a data set with $n$ observations and $p$ variables, under the CCM, an outlier is a unit, even if only one or few values are corrupted. Classical robust procedures were designed to cope with this setting and the impact of observations were limited whenev…
▽ More
The Classical Tukey-Huber Contamination Model (CCM) is a usual framework to describe the mechanism of outliers generation in robust statistics. In a data set with $n$ observations and $p$ variables, under the CCM, an outlier is a unit, even if only one or few values are corrupted. Classical robust procedures were designed to cope with this setting and the impact of observations were limited whenever necessary. Recently, a different mechanism of outliers generation, namely Independent Contamination Model (ICM), was introduced. In this new setting each cell of the data matrix might be corrupted or not with a probability independent on the status of the other cells. ICM poses new challenge to robust statistics since the percentage of contaminated rows dramatically increase with $p$, often reaching more than $50\%$. When this situation appears, classical affine equivariant robust procedures do not work since their breakdown point is $50\%$. For this contamination model we propose a new type of robust methods namely composite robust procedures which are inspired on the idea of composite likelihood, where low dimension likelihood, very often the likelihood of pairs, are aggregate together in order to obtain an approximation of the full likelihood which is more tractable. Our composite robust procedures are build over pairs of observations in order to gain robustness in the independent contamination model. We propose composite S and $τ$-estimators for linear mixed models. Composite $τ$-estimators are proved to have an high breakdown point both in the CCM and ICM. A Monte Carlo study shows that our estimators compare favorably with respect to classical S-estimators under the CCM and outperform them under the ICM. One example based on a real data set illustrates the new robust procedure.
△ Less
Submitted 14 July, 2014; v1 submitted 8 July, 2014;
originally announced July 2014.
-
Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination
Authors:
Claudio Agostinelli,
Andy Leung,
Victor J. Yohai,
Ruben H. Zamar
Abstract:
Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of variables and a relatively small number of cases are common place in modern statistical applications. In these cases global down-weighting of an entire case, as perfor…
▽ More
Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of variables and a relatively small number of cases are common place in modern statistical applications. In these cases global down-weighting of an entire case, as performed by traditional robust procedures, may lead to poor results. We highlight the need for a new generation of robust estimators that can efficiently deal with cellwise outliers and at the same time show good performance under casewise outliers.
△ Less
Submitted 23 June, 2014;
originally announced June 2014.