-
Directional density-based clustering
Authors:
Paula Saavedra-Nieves,
Martín Fernández-Pérez
Abstract:
Density-based clustering methodology has been widely considered in the statistical literature for classifying Euclidean observations. However, this approach has not been contemplated for directional data yet. In this work, directional density-based clustering methodology is fully established for the unit hypersphere by solving the computational problems associated to high dimensional spaces. We al…
▽ More
Density-based clustering methodology has been widely considered in the statistical literature for classifying Euclidean observations. However, this approach has not been contemplated for directional data yet. In this work, directional density-based clustering methodology is fully established for the unit hypersphere by solving the computational problems associated to high dimensional spaces. We also provide a circular and spherical exploratory tool for studying the effect of the smoothing parameter when kernel density estimation methods are considered. An extensive simulation study shows the performance of the resulting classification procedure for the circle and for the sphere. The methodology is also applied to analyse an exoplanets dataset.
△ Less
Submitted 3 March, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
On systems of quotas based on bankruptcy with a priori unions: estimating random arrival-style rules
Authors:
A. Saavedra-Nieves,
P. Saavedra-Nieves
Abstract:
This paper addresses a sampling procedure for estimating extensions of the random arrival rule to those bankruptcy situations where there exist a priori unions. It is based on simple random sampling with replacement and it adapts an estimation method of the Owen value for transferable utility games with a priori unions, especially useful when the set of involved agents is sufficiently large. We an…
▽ More
This paper addresses a sampling procedure for estimating extensions of the random arrival rule to those bankruptcy situations where there exist a priori unions. It is based on simple random sampling with replacement and it adapts an estimation method of the Owen value for transferable utility games with a priori unions, especially useful when the set of involved agents is sufficiently large. We analyse the theoretical statistical properties of the resulting estimator as well as we provide some bounds for the incurred error. Its performance is evaluated on two well-studied examples in literature where this allocation rule can be exactly obtained. Finally, we apply this sampling method to provide a new quota system for the milk market in Galicia (Spain) to check the role of different territorial structures when they are taken as a priori unions. The resulting quotas estimator is also compared with two classical rules in bankruptcy literature.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Nonparametric estimation of highest density regions for COVID-19
Authors:
Paula Saavedra-Nieves
Abstract:
Highest density regions refer to level sets containing points of relatively high density. Their estimation from a random sample, generated from the underlying density, allows to determine the clusters of the corresponding distribution. This task can be accomplished considering different nonparametric perspectives. From a practical point of view, reconstructing highest density regions can be interp…
▽ More
Highest density regions refer to level sets containing points of relatively high density. Their estimation from a random sample, generated from the underlying density, allows to determine the clusters of the corresponding distribution. This task can be accomplished considering different nonparametric perspectives. From a practical point of view, reconstructing highest density regions can be interpreted as a way of determining hot-spots, a crucial task for understanding COVID-19 space-time evolution. In this work, we compare the behavior of classical plug-in methods and a recently proposed hybrid algorithm for highest density regions estimation through an extensive simulation study. Both methodologies are applied to analyze a real data set about COVID-19 cases in the United States.
△ Less
Submitted 20 November, 2020; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Nonparametric estimation of directional highest density regions
Authors:
Paula Saavedra-Nieves,
Rosa María Crujeiras
Abstract:
Reconstruction of sets from a random sample of points intimately related to them is the goal of set estimation theory. Within this context, a particular problem is the one related with the reconstruction of density level sets and specifically, those ones with a high probability content, namely highest density regions.
We define highest density regions for directional data and provide a plug-in e…
▽ More
Reconstruction of sets from a random sample of points intimately related to them is the goal of set estimation theory. Within this context, a particular problem is the one related with the reconstruction of density level sets and specifically, those ones with a high probability content, namely highest density regions.
We define highest density regions for directional data and provide a plug-in estimator, based on kernel smoothing. A suitable bootstrap bandwidth selector is provided for the practical implementation of the proposal. An extensive simulation study shows the performance of the plug-in estimator proposed with the bootstrap bandwidth selector and with other bandwidth selectors specifically designed for circular and spherical kernel density estimation. The methodology is applied to analyze two real data sets in animal orientation and seismology.
△ Less
Submitted 5 November, 2020; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Extent of occurrence reconstruction using a new data-driven support estimator
Authors:
A. Rodríguez-Casal,
P. Saavedra-Nieves
Abstract:
Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support S. Under the mild assumption that S is r-convex, the smallest r-convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that r is an unknown geometric characteristic of the set S. A stochast…
▽ More
Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support S. Under the mild assumption that S is r-convex, the smallest r-convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that r is an unknown geometric characteristic of the set S. A stochastic algorithm is proposed for determining an optimal estimate of r from the data under mild regularity assumptions on the density function. The resulting data-driven reconstruction of S attains the same convergence rates as the convex hull for estimating convex sets, but under a much more flexible smoothness shape condition. The new support estimator will be used for reconstructing the extent of occurrence of an assemblage of invasive plant species in the Azores archipelago.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Minimax Hausdorff estimation of density level sets
Authors:
Alberto Rodríguez-Casal,
Paula Saavedra-Nieves
Abstract:
Given a random sample of points from some unknown density, we propose a data-driven method for estimating density level sets under the r-convexity assumption. This shape condition generalizes the convexity property. However, the main problem in practice is that r is an unknown geometric characteristic of the set related to its curvature. A stochastic algorithm is proposed for selecting its optimal…
▽ More
Given a random sample of points from some unknown density, we propose a data-driven method for estimating density level sets under the r-convexity assumption. This shape condition generalizes the convexity property. However, the main problem in practice is that r is an unknown geometric characteristic of the set related to its curvature. A stochastic algorithm is proposed for selecting its optimal value from the data. The resulting reconstruction of the level set is able to achieve minimax rates for Hausdorff metric and distance in measure, up to log factors, uniformly on the level of the set.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
A fully data-driven method for estimating the shape of a point cloud
Authors:
Alberto Rodríguez-Casal,
Paula Saavedra-Nieves
Abstract:
Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support $S$. Under the mild assumption that $S$ is $r$-convex, the smallest $r$-convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that $r$ is an unknown geometric characteristic of the set…
▽ More
Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support $S$. Under the mild assumption that $S$ is $r$-convex, the smallest $r$-convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that $r$ is an unknown geometric characteristic of the set $S$. A stochastic algorithm is proposed for selecting it from the data under the hypothesis that the sample is uniformly generated. The new data-driven reconstruction of $S$ is able to achieve the same convergence rates as the convex hull for estimating convex sets, but under a much more flexible smoothness shape condition. The practical performance of the estimator is illustrated through a real data example and a simulation study.
△ Less
Submitted 27 November, 2014; v1 submitted 29 April, 2014;
originally announced April 2014.
-
A comparative simulation study of data-driven methods for estimating density level sets
Authors:
Paula Saavedra-Nieves,
Wenceslao González-Manteiga,
Alberto Rodríguez-Casal
Abstract:
Density level sets are mainly estimated using one of three methodologies: plug-in, excess mass, or a hybrid approach. The plug-in methods are based on replacing the unknown density by some nonparametric estimator, usually the kernel. Thus, the bandwidth selection is a fundamental problem from a practical point of view. Recently, specific selectors for level sets have been proposed. However, if som…
▽ More
Density level sets are mainly estimated using one of three methodologies: plug-in, excess mass, or a hybrid approach. The plug-in methods are based on replacing the unknown density by some nonparametric estimator, usually the kernel. Thus, the bandwidth selection is a fundamental problem from a practical point of view. Recently, specific selectors for level sets have been proposed. However, if some a priori information about the geometry of the level set is available, then excess mass algorithms can be useful. In this case, a density estimator is not necessary, and the problem of bandwidth selection can be avoided. The third methodology is a hybrid of the others. As in the excess mass method, it assumes a mild geometric restriction on the level set and, like the plug-in approach, requires a pilot nonparametric estimator of the density. One interesting open question concerns the practical performance of these methods. In this work, existing methods are reviewed, and two new hybrid algorithms are proposed. Their practical behaviour is compared through extensive simulations.
△ Less
Submitted 5 March, 2014; v1 submitted 3 February, 2014;
originally announced February 2014.